Skip to content

Commit 2d99bcf

Browse files
authored
Merge pull request #76 from agent-ecosystem/add-single-page-scoring-cap
Fix: cap scores for sites with single-page sample
2 parents 8de94a7 + cff3b8f commit 2d99bcf

6 files changed

Lines changed: 87 additions & 13 deletions

File tree

SCORING.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ Some problems are severe enough that no amount of other good behavior should com
168168
| `auth-gate-detection`: 75%+ of pages require auth | 39 (F) | Most documentation is inaccessible. |
169169
| `auth-gate-detection`: 50%+ of pages require auth | 59 (D) | Significant documentation is inaccessible. |
170170
| `no-viable-path` diagnostic fires (see below) | 39 (F) | Agents have no effective way to access content at all. |
171+
| `single-page-sample` diagnostic fires (see below) | 59 (D) | Too few pages discovered to produce a representative site-wide score. |
171172

172173
When multiple caps apply, the lowest one wins.
173174

@@ -243,6 +244,8 @@ Some problems only become visible when you look at multiple checks together. The
243244

244245
**What it means**: Page-level category scores (page size, content structure, URL stability, etc.) are based on too few pages to be representative. These categories are marked as N/A in the score.
245246

247+
**Score impact**: This diagnostic caps the score at 59 (D). With page-level checks excluded, the remaining signal is too narrow to support a higher grade.
248+
246249
**What to do**: If your site has an llms.txt, ensure it contains working links so the tool can discover more pages. If testing a preview deployment, use `--canonical-origin` to rewrite cross-origin llms.txt links. You can also provide specific pages with `--urls`.
247250

248251
### All llms.txt links are cross-origin

docs/agent-score-calculation.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -147,14 +147,15 @@ Two checks have no warn state and are strictly pass/fail: `http-status-codes` an
147147

148148
Some problems are severe enough that no amount of other passing checks should compensate. When AFDocs detects a critical issue, we cap the score regardless of how well everything else performs.
149149

150-
| Condition | Cap | Why |
151-
| ------------------------------------------------------------------------------------- | ------ | ------------------------------------------------------ |
152-
| `llms-txt-exists` fails | 59 (D) | Agents lose their primary navigation mechanism. |
153-
| `rendering-strategy`: proportion ≤ 0.25 | 39 (F) | Most content is invisible to agents. |
154-
| `rendering-strategy`: proportion ≤ 0.50 | 59 (D) | Significant content is invisible. |
155-
| `auth-gate-detection`: 75%+ pages gated | 39 (F) | Most documentation is inaccessible. |
156-
| `auth-gate-detection`: 50%+ pages gated | 59 (D) | Significant documentation is inaccessible. |
157-
| [No viable path](/interaction-diagnostics#no-viable-path-to-content) diagnostic fires | 39 (F) | Agents have no effective way to access content at all. |
150+
| Condition | Cap | Why |
151+
| ------------------------------------------------------------------------------------- | ------ | ----------------------------------------------------------- |
152+
| `llms-txt-exists` fails | 59 (D) | Agents lose their primary navigation mechanism. |
153+
| `rendering-strategy`: proportion ≤ 0.25 | 39 (F) | Most content is invisible to agents. |
154+
| `rendering-strategy`: proportion ≤ 0.50 | 59 (D) | Significant content is invisible. |
155+
| `auth-gate-detection`: 75%+ pages gated | 39 (F) | Most documentation is inaccessible. |
156+
| `auth-gate-detection`: 50%+ pages gated | 59 (D) | Significant documentation is inaccessible. |
157+
| [No viable path](/interaction-diagnostics#no-viable-path-to-content) diagnostic fires | 39 (F) | Agents have no effective way to access content at all. |
158+
| [Single-page sample](/interaction-diagnostics#single-page-sample) diagnostic fires | 59 (D) | Too few pages discovered to produce a representative score. |
158159

159160
When multiple caps apply, the lowest one wins.
160161

@@ -169,6 +170,7 @@ When automatic page discovery finds fewer than 5 pages (using `random` or `deter
169170
- **Page-level checks** (those that test sampled pages like `page-size-html`, `rendering-strategy`, `http-status-codes`, etc.) are marked as "not applicable" and excluded from the score.
170171
- **Site-level checks** (llms.txt checks, coverage, auth-alternative-access) are scored normally.
171172
- **Category scores** where all checks are not applicable display as a dash instead of a number.
173+
- **The overall score is capped at 59 (D)**, since the remaining numerator covers only a narrow slice of site-wide signal and shouldn't drive a higher grade on its own.
172174

173175
This typically happens when a site has no llms.txt or its llms.txt links point to a different origin (common with preview deployments). A [`single-page-sample` diagnostic](/interaction-diagnostics#single-page-sample) fires to explain the situation.
174176

docs/interaction-diagnostics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ These diagnostics appear in the "Interaction Diagnostics" section of the `--form
9494

9595
This diagnostic does not fire when you explicitly choose pages with `--urls`, `--sampling curated`, or `--sampling none`.
9696

97-
**Score impact**: Page-level checks are excluded from the overall score and their categories show as N/A. Only site-level checks (llms.txt checks, coverage, auth-alternative-access) contribute to the score.
97+
**Score impact**: Page-level checks are excluded from the overall score and their categories show as N/A. Only site-level checks (llms.txt checks, coverage, auth-alternative-access) contribute to the score, and the overall score is capped at 59 (D) so a narrow signal can't produce a misleadingly high grade.
9898

9999
## All llms.txt links are cross-origin
100100

scoring-reference.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,16 @@ capped at 39 (F). A site where agents have no effective way to access content
282282
should not score above F regardless of how well the infrastructure checks
283283
perform.
284284

285+
### Diagnostic-Driven Cap: `single-page-sample`
286+
287+
When the `single-page-sample` diagnostic fires (fewer than
288+
`MIN_PAGES_FOR_SCORING` pages discovered via random/deterministic sampling),
289+
all page-level checks are marked `notApplicable` and excluded from scoring.
290+
The remaining numerator/denominator can produce a misleadingly high overall
291+
score from a tiny subset of site-wide signal (typically just the llms.txt
292+
structural checks). To prevent this, the overall score is capped at 59 (D)
293+
when this diagnostic fires.
294+
285295
When multiple caps apply, the lowest cap wins.
286296

287297
The cap is applied **after** the weighted score calculation but diagnostics
@@ -592,6 +602,9 @@ in dependency order: `markdown-undiscoverable` and
592602
links so the tool can discover more pages. If testing a preview deployment,
593603
use --canonical-origin to rewrite cross-origin llms.txt links. You can also
594604
provide specific pages with --urls.
605+
- **Score cap**: When this diagnostic fires, the overall score is capped at
606+
59 (D). See "Diagnostic-Driven Cap: `single-page-sample`" in the Score Caps
607+
section.
595608

596609
#### `cross-origin-llms-txt`
597610

src/scoring/score.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,16 @@ function computeCap(
208208
});
209209
}
210210

211+
// Single-page sample: page-level checks were marked notApplicable, so the
212+
// remaining score reflects only a tiny subset of site-wide signal.
213+
if (triggeredDiagnostics.has('single-page-sample')) {
214+
caps.push({
215+
cap: 59,
216+
checkId: 'single-page-sample',
217+
reason: 'Too few pages discovered to produce a representative score.',
218+
});
219+
}
220+
211221
if (caps.length === 0) return undefined;
212222

213223
// Lowest cap wins

test/unit/scoring/score.test.ts

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,48 @@ describe('computeScore', () => {
212212
expect(score.overall).toBeLessThanOrEqual(39);
213213
});
214214

215+
it('applies single-page-sample cap at 59', () => {
216+
// Reproduces the issue #73 scenario: llms.txt exists and is the right size,
217+
// but is structurally invalid. With only 1 page discovered, page-level
218+
// checks are excluded as notApplicable, leaving the raw score driven by a
219+
// tiny subset of site-wide signal (the issue reported 81/B without a cap).
220+
const results: CheckResult[] = [
221+
makeResult('llms-txt-exists', 'content-discoverability', 'pass'),
222+
makeResult('llms-txt-valid', 'content-discoverability', 'fail'),
223+
makeResult('llms-txt-size', 'content-discoverability', 'pass'),
224+
];
225+
const score = computeScore(
226+
makeReport(results, { samplingStrategy: 'deterministic', testedPages: 1 }),
227+
);
228+
expect(score.diagnostics.find((d) => d.id === 'single-page-sample')).toBeDefined();
229+
expect(score.cap).toBeDefined();
230+
expect(score.cap!.cap).toBe(59);
231+
expect(score.cap!.checkId).toBe('single-page-sample');
232+
expect(score.overall).toBeLessThanOrEqual(59);
233+
});
234+
235+
it('single-page-sample cap loses to no-viable-path cap', () => {
236+
// Both diagnostics fire; lowest cap (no-viable-path at 39) should win.
237+
// Pass enough site-level checks to push raw score above 39 so the cap is
238+
// observable in scoreResult.cap.
239+
const results: CheckResult[] = [
240+
makeResult('llms-txt-exists', 'content-discoverability', 'fail'),
241+
makeResult('rendering-strategy', 'page-size', 'skip'),
242+
makeResult('markdown-url-support', 'markdown-availability', 'fail'),
243+
makeResult('llms-txt-size', 'content-discoverability', 'pass'),
244+
makeResult('auth-gate-detection', 'authentication', 'pass'),
245+
makeResult('auth-alternative-access', 'authentication', 'pass'),
246+
];
247+
const score = computeScore(
248+
makeReport(results, { samplingStrategy: 'deterministic', testedPages: 1 }),
249+
);
250+
expect(score.diagnostics.find((d) => d.id === 'no-viable-path')).toBeDefined();
251+
expect(score.diagnostics.find((d) => d.id === 'single-page-sample')).toBeDefined();
252+
expect(score.cap).toBeDefined();
253+
expect(score.cap!.cap).toBe(39);
254+
expect(score.cap!.checkId).toBe('no-viable-path');
255+
});
256+
215257
it('does not apply cap when score is already below cap', () => {
216258
// All-fail scenario: raw score is 0, cap at 59 wouldn't reduce it
217259
const results: CheckResult[] = [
@@ -530,15 +572,17 @@ describe('computeScore', () => {
530572
failBucket: 1,
531573
}),
532574
];
533-
// With N/A: only llms-txt-exists counts (pass) -> 100
575+
// With N/A: only llms-txt-exists counts (pass) — raw score 100, but
576+
// single-page-sample cap pulls it to 59. Verify exclusion via checkScores.
534577
const scoreNA = computeScore(
535578
makeReport(results, { testedPages: 1, samplingStrategy: 'random' }),
536579
);
537580
// Without N/A: both count, page-size-html fails -> less than 100
538581
const scoreNormal = computeScore(
539582
makeReport(results, { testedPages: 10, samplingStrategy: 'random' }),
540583
);
541-
expect(scoreNA.overall).toBe(100);
584+
expect(scoreNA.checkScores['page-size-html'].scoreDisplayMode).toBe('notApplicable');
585+
expect(scoreNA.checkScores['llms-txt-exists'].scoreDisplayMode).toBe('numeric');
542586
expect(scoreNormal.overall).toBeLessThan(100);
543587
});
544588

@@ -568,12 +612,14 @@ describe('computeScore', () => {
568612
spaShells: 1,
569613
}),
570614
];
571-
// With N/A: rendering-strategy is notApplicable, cap should NOT fire
615+
// With N/A: rendering-strategy is notApplicable, so its cap should NOT
616+
// fire. (single-page-sample's own 59 cap may still apply, but we're
617+
// asserting that the rendering-strategy cap specifically doesn't.)
572618
const scoreNA = computeScore(
573619
makeReport(results, { testedPages: 1, samplingStrategy: 'random' }),
574620
);
575621
expect(scoreNA.checkScores['rendering-strategy'].scoreDisplayMode).toBe('notApplicable');
576-
expect(scoreNA.cap).toBeUndefined();
622+
expect(scoreNA.cap?.checkId).not.toBe('rendering-strategy');
577623

578624
// Without N/A: same data, cap SHOULD fire
579625
const scoreNormal = computeScore(

0 commit comments

Comments
 (0)