Skip to content

Commit b3317cd

Browse files
authored
Merge pull request #88 from agent-ecosystem/docs/url-stability-updates
Clarify `http-status-code behavior`, fix site 500 v 404
2 parents bbc5b6f + 98f3edc commit b3317cd

3 files changed

Lines changed: 32 additions & 14 deletions

File tree

SCORING.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -143,14 +143,16 @@ This behavior does **not** apply when:
143143

144144
Not all warnings represent the same degree of degradation. A warning on `llms-txt-valid` (structure is non-standard but links are parseable) is less severe than a warning on `rendering-strategy` (sparse content that might need JavaScript). Most checks have a specific warn coefficient:
145145

146-
| Coefficient | Meaning | Checks |
147-
| ----------- | ---------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
148-
| **0.75** | Content substantively intact | `llms-txt-valid`, `content-negotiation`, `llms-txt-links-resolve`, `llms-txt-coverage`, `markdown-content-parity` |
149-
| **0.60** | Partial coverage or platform-dependent | `llms-txt-directive-html`, `llms-txt-directive-md`, `redirect-behavior` |
150-
| **0.50** | Genuine functional degradation | `llms-txt-exists`, `llms-txt-size`, `rendering-strategy`, `markdown-url-support`, `page-size-markdown`, `page-size-html`, `content-start-position`, `tabbed-content-serialization`, `section-header-quality`, `cache-header-hygiene`, `auth-gate-detection`, `auth-alternative-access` |
151-
| **0.25** | Actively steering agents to a worse path | `llms-txt-links-markdown` (markdown exists but llms.txt links to HTML; agents don't discover .md variants on their own) |
152-
153-
`markdown-code-fence-validity` only has pass/fail (no warn state). `http-status-codes` is normally pass/fail but warns when every sampled response is indeterminate (HTTP 202 from CDN cache-miss/build, or 5xx) so the check couldn't measure bad-URL handling.
146+
| Coefficient | Meaning | Checks |
147+
| ----------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
148+
| **0.75** | Content substantively intact | `llms-txt-valid`, `content-negotiation`, `llms-txt-links-resolve`, `llms-txt-coverage`, `markdown-content-parity` |
149+
| **0.60** | Partial coverage or platform-dependent | `llms-txt-directive-html`, `llms-txt-directive-md`, `redirect-behavior` |
150+
| **0.50** | Genuine functional degradation | `llms-txt-exists`, `llms-txt-size`, `rendering-strategy`, `markdown-url-support`, `page-size-markdown`, `page-size-html`, `content-start-position`, `tabbed-content-serialization`, `section-header-quality`, `cache-header-hygiene`, `auth-gate-detection`, `auth-alternative-access`, `http-status-codes`|
151+
| **0.25** | Actively steering agents to a worse path | `llms-txt-links-markdown` (markdown exists but llms.txt links to HTML; agents don't discover .md variants on their own) |
152+
153+
`markdown-code-fence-validity` only has pass/fail (no warn state).
154+
155+
`http-status-codes` is normally pass/fail. It warns only when every sampled response is indeterminate (HTTP 202 from CDN cache-miss/build, or 5xx), meaning bad-URL handling couldn't be measured. In that case the check applies the default 0.5 warn coefficient rather than scoring zero. Mixed responses (e.g., some `correct-error`, some `indeterminate`) are scored from the determinate subset only.
154156

155157
## Score caps
156158

docs/checks/url-stability.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,19 +17,29 @@ In empirical testing, soft 404s (pages returning 200 with "page not found" conte
1717

1818
### Results
1919

20-
| Result | Condition |
21-
| ------ | -------------------------------------------------- |
22-
| Pass | Fabricated bad URLs return proper 4xx status codes |
23-
| Fail | Bad URLs return 200 (soft 404) |
20+
| Result | Condition |
21+
| ------ | ------------------------------------------------------------------------------------------ |
22+
| Pass | Fabricated bad URLs return proper 4xx status codes |
23+
| Warn | Every sampled response was indeterminate (HTTP 202 or 5xx); bad-URL handling is unmeasured |
24+
| Fail | Bad URLs return 200 (soft 404) |
2425

25-
This check has no warn state; it's strictly pass/fail.
26+
AFDocs tests this by generating non-existent URLs based on your site's URL structure and checking whether the server returns 404 or 200. Per-page responses fall into one of three buckets:
2627

27-
AFDocs tests this by generating non-existent URLs based on your site's URL structure and checking whether the server returns 404 or 200.
28+
- **`correct-error`** (counts toward pass): 4xx status code.
29+
- **`soft-404`** (counts toward fail): 2xx/3xx status code, often a templated "page not found" page.
30+
- **`indeterminate`** (excluded from the soft-404 tally): HTTP 202 or 5xx. RFC 7231 says 202 means "still processing," and Vercel/Next.js ISR returns it during cache-miss/build for fresh URLs. 5xx responses tell us nothing about how the site handles bad URLs. Both are reported separately rather than penalized as soft 404s.
31+
32+
If at least one response is determinate, the check scores from the determinate subset (e.g., 2 correct-error + 1 indeterminate scores as 2/2 = pass). The warn state only fires when **every** sampled response is indeterminate, in which case the check applies the default 0.5 warn coefficient because bad-URL handling could not be measured.
2833

2934
### How to fix
3035

3136
Configure your server or hosting platform to return a 404 status code for pages that don't exist. Most docs platforms handle this correctly by default; the common exception is single-page applications that serve the shell HTML for all routes and handle 404s client-side.
3237

38+
**If this check warns** with "all sampled pages returned indeterminate responses," the most common causes are:
39+
40+
- **Vercel/Next.js ISR** returning 202 during cache-miss or build. Real agents (low concurrency, warm cache) typically don't hit this, so it's noise rather than signal. No action needed.
41+
- **A misconfigured server returning 5xx for missing paths** (e.g., an Apache rewrite rule that maps `/foo` to `/foo.html` without checking that the target file exists, then loops or hits an internal error). This is a real issue: agents requesting a typo'd URL get a 500 instead of a clean 404. Add a guard so the rewrite only fires when the target exists, and set an `ErrorDocument 404` directive that points at your platform's 404 page.
42+
3343
### What about serving helpful content on missing pages?
3444

3545
It's tempting to serve something useful when an agent requests a page that doesn't exist. For example, you might return your `llms.txt` as a fallback, or a "did you mean?" page with links to related content. This seems like an elegant solution to agents hallucinating URLs.

docs/public/.htaccess

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,16 @@ RewriteRule ^llms\.txt$ /log-agent-signal.php?path=llms.txt&trigger=llms-txt [L,
4040
# VitePress builds non-index pages as flat .html files (quick-start.html),
4141
# not directories (quick-start/index.html). This rule maps trailing-slash
4242
# URLs to their .html counterparts so the directive check can fetch them.
43+
# Guard with a -f check on the .html target so missing paths fall through
44+
# to a real 404 (via ErrorDocument below) rather than looping into a 500.
4345
RewriteCond %{REQUEST_FILENAME} !-f
4446
RewriteCond %{REQUEST_FILENAME} !-d
47+
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
4548
RewriteRule ^(.*?)/?$ /$1.html [L]
4649

50+
# Serve the VitePress 404 page body for missing paths and return a real 404.
51+
ErrorDocument 404 /404.html
52+
4753
# Serve .md files with the correct content type
4854
AddType text/markdown .md
4955

0 commit comments

Comments
 (0)