You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: SCORING.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -143,14 +143,16 @@ This behavior does **not** apply when:
143
143
144
144
Not all warnings represent the same degree of degradation. A warning on `llms-txt-valid` (structure is non-standard but links are parseable) is less severe than a warning on `rendering-strategy` (sparse content that might need JavaScript). Most checks have a specific warn coefficient:
|**0.25**| Actively steering agents to a worse path |`llms-txt-links-markdown` (markdown exists but llms.txt links to HTML; agents don't discover .md variants on their own) |
152
-
153
-
`markdown-code-fence-validity` only has pass/fail (no warn state). `http-status-codes` is normally pass/fail but warns when every sampled response is indeterminate (HTTP 202 from CDN cache-miss/build, or 5xx) so the check couldn't measure bad-URL handling.
|**0.25**| Actively steering agents to a worse path |`llms-txt-links-markdown` (markdown exists but llms.txt links to HTML; agents don't discover .md variants on their own) |
152
+
153
+
`markdown-code-fence-validity` only has pass/fail (no warn state).
154
+
155
+
† `http-status-codes` is normally pass/fail. It warns only when every sampled response is indeterminate (HTTP 202 from CDN cache-miss/build, or 5xx), meaning bad-URL handling couldn't be measured. In that case the check applies the default 0.5 warn coefficient rather than scoring zero. Mixed responses (e.g., some `correct-error`, some `indeterminate`) are scored from the determinate subset only.
| Pass | Fabricated bad URLs return proper 4xx status codes |
23
+
| Warn | Every sampled response was indeterminate (HTTP 202 or 5xx); bad-URL handling is unmeasured |
24
+
| Fail | Bad URLs return 200 (soft 404) |
24
25
25
-
This check has no warn state; it's strictly pass/fail.
26
+
AFDocs tests this by generating non-existent URLs based on your site's URL structure and checking whether the server returns 404 or 200. Per-page responses fall into one of three buckets:
26
27
27
-
AFDocs tests this by generating non-existent URLs based on your site's URL structure and checking whether the server returns 404 or 200.
28
+
-**`correct-error`** (counts toward pass): 4xx status code.
29
+
-**`soft-404`** (counts toward fail): 2xx/3xx status code, often a templated "page not found" page.
30
+
-**`indeterminate`** (excluded from the soft-404 tally): HTTP 202 or 5xx. RFC 7231 says 202 means "still processing," and Vercel/Next.js ISR returns it during cache-miss/build for fresh URLs. 5xx responses tell us nothing about how the site handles bad URLs. Both are reported separately rather than penalized as soft 404s.
31
+
32
+
If at least one response is determinate, the check scores from the determinate subset (e.g., 2 correct-error + 1 indeterminate scores as 2/2 = pass). The warn state only fires when **every** sampled response is indeterminate, in which case the check applies the default 0.5 warn coefficient because bad-URL handling could not be measured.
28
33
29
34
### How to fix
30
35
31
36
Configure your server or hosting platform to return a 404 status code for pages that don't exist. Most docs platforms handle this correctly by default; the common exception is single-page applications that serve the shell HTML for all routes and handle 404s client-side.
32
37
38
+
**If this check warns** with "all sampled pages returned indeterminate responses," the most common causes are:
39
+
40
+
-**Vercel/Next.js ISR** returning 202 during cache-miss or build. Real agents (low concurrency, warm cache) typically don't hit this, so it's noise rather than signal. No action needed.
41
+
-**A misconfigured server returning 5xx for missing paths** (e.g., an Apache rewrite rule that maps `/foo` to `/foo.html` without checking that the target file exists, then loops or hits an internal error). This is a real issue: agents requesting a typo'd URL get a 500 instead of a clean 404. Add a guard so the rewrite only fires when the target exists, and set an `ErrorDocument 404` directive that points at your platform's 404 page.
42
+
33
43
### What about serving helpful content on missing pages?
34
44
35
45
It's tempting to serve something useful when an agent requests a page that doesn't exist. For example, you might return your `llms.txt` as a fallback, or a "did you mean?" page with links to related content. This seems like an elegant solution to agents hallucinating URLs.
0 commit comments