This document specifies the consumer-side contract that mcp.hosting implements against the mcp-compliance engine. It defines URL surfaces, data flow, edge cases, and quality bars for Phase 2 work.
The producer side (this repo) owns:
- The
ComplianceReportJSON shape — seeschemas/report.v1.json. runComplianceSuite()— the engine, with theonTestComplete(result)callback for live streaming.- The
--format jsonCLI output, byte-stable across runs (modulo timestamps anddurationMs).
The consumer side (mcp.hosting) owns everything described below.
The mcp.hosting brand is the platform; every compliance feature lives under it. The three new surfaces:
| Path | Purpose | Cache strategy |
|---|---|---|
GET /compliance |
"Test your server" landing + form | static, 1h CDN |
POST /compliance/test |
Kick off a test run, return a job ID | no-cache |
GET /compliance/jobs/<jobId>/stream |
Live SSE stream of test results | no-cache |
GET /compliance/ext/<hash> |
Public report page for a tested server | edge-cache 5m, invalidate on re-test |
GET /api/compliance/ext/<hash>/badge |
SVG badge | edge-cache 1h, invalidate on re-test |
GET /api/compliance/ext/<hash>/report.json |
Raw JSON report (for tooling) | edge-cache 5m |
GET /compliance/leaderboard |
Public ranking by grade | edge-cache 1m |
GET /compliance/tests/<test-id> |
Per-test explainer (81 pages) | static, 1h CDN |
The <hash> is the 24-char SHA-256 prefix of the server URL — already implemented in src/badge.ts:8.
┌────────────┐ POST /compliance/test ┌─────────────────┐
│ Browser │ ────────────────────────────▶│ mcp.hosting API │
│ or CLI │ { url, auth?, transport } └────────┬────────┘
└────────────┘ │
▲ ▼
│ SSE stream ┌──────────────────┐
│ per-test results │ Test runner │
│ │ (Node worker) │
│ └────────┬─────────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ runComplianceSuite│
│ │ ({ onTestComplete}│
│ └────────┬─────────┘
│ │
│ GET /compliance/ext/<hash> ▼
│ ◀──────────────────────────────── store full report
│ in DB keyed by hash
│ + index in leaderboard
Request:
{
"url": "https://example.com/mcp",
"auth": "Bearer …", // optional, header value or shorthand
"transport": "http", // "http" | "stdio"; stdio rejected from public web UI
"publish": true // false = test for me but don't list publicly
}Response:
{ "jobId": "01J9X…", "streamUrl": "/compliance/jobs/01J9X…/stream" }Behavior:
- Reject
stdiotransport from the public web UI (security: arbitrary command execution). Stdio runs only via the CLI on user machines. - Rate-limit to 5 concurrent jobs per IP; queue beyond that.
- Validate
urlishttp(s)://; reject internal-IP ranges via the same patterns used insrc/runner.ts:51-59. - Hash the URL with the same scheme as
urlHash()insrc/badge.ts:8so the resulting<hash>matches CLI-generated badge URLs.
SSE stream. Events:
event: test
data: { "id": "transport-post", "name": "...", "category": "transport", "passed": true, "required": true, "details": "HTTP 200", "durationMs": 42, "specRef": "..." }
event: test
data: { ... }
event: complete
data: { "hash": "abc123…", "reportUrl": "/compliance/ext/abc123…", "grade": "A", "score": 92.5 }
event: error
data: { "message": "Server unreachable" }
Implementation: spawn runComplianceSuite() on a worker, pipe onTestComplete callbacks into the SSE response. The full report (final return value) is stored in the DB before the complete event fires.
Postgres tables (or equivalent):
CREATE TABLE compliance_reports (
hash TEXT PRIMARY KEY, -- 24-char URL hash
url TEXT NOT NULL, -- the tested URL
schema_version TEXT NOT NULL, -- pin against; reject unknown
spec_version TEXT NOT NULL,
tool_version TEXT NOT NULL,
report JSONB NOT NULL, -- the full ComplianceReport
grade CHAR(1) NOT NULL, -- denormalized for index
score NUMERIC(5,2) NOT NULL, -- denormalized for index
tested_at TIMESTAMPTZ NOT NULL,
published BOOLEAN NOT NULL DEFAULT TRUE,
owner_verified BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX ON compliance_reports (grade, score DESC) WHERE published;
CREATE INDEX ON compliance_reports (tested_at DESC) WHERE published;
CREATE TABLE compliance_jobs (
id TEXT PRIMARY KEY, -- ULID
url TEXT NOT NULL,
hash TEXT NOT NULL,
status TEXT NOT NULL, -- queued | running | complete | error
started_at TIMESTAMPTZ,
finished_at TIMESTAMPTZ,
error TEXT
);Validate every report blob against schemas/report.v1.json on insert. Reject inserts where schemaVersion is unknown — never guess at the shape.
GET /api/compliance/ext/<hash>/badge → SVG.
Color-coded by grade:
| Grade | Color (hex) | Label |
|---|---|---|
| A | #3fb950 (green) |
A — MCP Compliant |
| B | #7cba2c (lime) |
B — MCP Compliant |
| C | #d29922 (yellow) |
C — MCP Partial |
| D | #db6d28 (orange) |
D — MCP Partial |
| F | #f85149 (red) |
F — Not Compliant |
Constraints:
- Size ≤ 2 KB.
<title>element:MCP Compliance: Grade <X> (<score>%)for screen readers.- shields.io-compatible width/height so it lines up next to other badges in a README.
Cache-Control: public, max-age=3600, s-maxage=3600. Invalidate on re-test.- If hash unknown: 404 SVG with neutral "untested" badge (do not 500).
Server-rendered, mobile-first, social-share-optimized.
Required sections (in order):
- Hero: server URL (truncated), grade letter (large), score, tested-at date, "Re-test" button.
- Verdict:
pass | partial | failwith one-sentence summary. - Category breakdown: bar per category with
passed/total. Click → filter the test list. - Test list: every test, grouped by category. Each row: status icon, name, details, link to spec, link to per-test explainer page.
- Server info: protocol version, name, version, capabilities, tool/resource/prompt counts.
- Embed your badge: copy-paste markdown + HTML snippets.
- Footer: "Tested with mcp-compliance v against MCP spec . Not affiliated with Anthropic."
Open Graph metadata:
<meta property="og:title" content="MCP Compliance: Grade A (92%) — example.com">
<meta property="og:image" content="https://mcp.hosting/api/compliance/ext/<hash>/og.png">
<meta property="og:description" content="82/88 tests passed. Tested 2026-04-12.">The og.png is generated from the badge + summary at request time (or pre-generated on report insert).
81 pages, one per test. Content seeded from TEST_DEFINITIONS in src/types.ts and the explain MCP tool output.
Each page:
- What it tests —
descriptionfromTestDefinition. - Why it matters — written content (not auto-generated).
- Spec reference — deep link to the relevant spec section using
SPEC_BASE+specRef. - Common failure modes — written content.
- How to fix —
recommendationfromTestDefinition, expanded. - Servers that pass / fail this test — auto-populated from the leaderboard data.
This is the SEO engine. Each page targets a niche search term; together they form the authority moat. No stub pages — every page must have real "Why it matters" and "Common failure modes" content before launch.
Default view: top 50 by grade then score, descending.
Filters:
- Grade (
A/B/C/D/F) - Transport (
httponly — stdio servers can't be publicly tested) - Spec version (e.g.
2025-11-25) - Category — show only servers that pass all of
tools,security, etc. - Tested within (24h / 7d / 30d / all)
Sidebar: "Recently tested" (10 most recent, regardless of grade).
Server owners can claim or delist their entry via:
- DNS TXT challenge:
_mcp-compliance.<host>containing a token issued by mcp.hosting. - Email-from-domain: send a request from any
@<host>address.
Verified owners get:
- A "verified" badge on their leaderboard entry.
- The ability to delist or hide their server.
- Email notifications when their grade changes.
Unverified entries can be delisted on request (lower friction; just email support).
Verified entries are auto-retested monthly. If the grade drops, the owner is emailed. Public leaderboard always shows the most recent result.
When the smart-routing MCP discovers or proxies a target server:
- Look up the target's compliance grade by hashing its URL.
- If found, surface in discovery output:
{ url, name, grade: "A", score: 92.5, testedAt: "..." }. - If not found, optionally trigger a background test (rate-limited).
- Optional admin setting:
minGrade: "B"— refuse to route to servers below threshold.
This is what makes the grade actually matter — servers with bad grades become invisible to a default mcp.hosting install.
Before launch:
- Badge renders correctly in GitHub README markdown preview (mobile + desktop).
- Report page loads in <1s on cold cache, <200ms warm.
- All 81 explainer pages have populated "Why it matters" and "Common failure modes" sections — no stubs.
- Leaderboard has ≥20 seeded entries from
modelcontextprotocol/serversand top community servers. - Open Graph image renders correctly when shared on Twitter/Slack/Discord.
- Re-testing a server invalidates the badge and report cache within 30s.
- DNS TXT verification flow works end-to-end.
- Router integration shows grades in discovery output for at least one default-routed server.
-
schemas/report.v1.jsonis fetched and validated on every insert; bad reports rejected with a clear error.
Cross-cutting:
-
https://mcp.hosting/schemas/compliance/report.v1.jsonresolves to the schema (matches the$id). - mcpcompliance.io / .net / .sh redirect 301 to
https://mcp.hosting/compliance. - Footer on every compliance page includes "Not affiliated with Anthropic. Model Context Protocol is developed by Anthropic."
- mcp.hosting pins against
schemaVersion: "1". When this repo bumps to"2", mcp.hosting:- Continues serving v1 reports from existing storage.
- Stands up a v2 ingestion path against the new schema.
- Migrates or dual-writes during transition; never silently mixes shapes.
toolVersionis informational, never branched on.specVersiondrives the displayed "tested against" string; old reports stay valid for their version with a "regrade against latest" CTA.
- Where does the test-runner worker live? Same Node process as the API, or a separate queue (e.g. Cloudflare Workers + Durable Objects, or BullMQ on a node)?
- Owner verification: DNS TXT only, or also support a
.well-known/mcp-compliance-ownerfile? - Should re-test be public-callable (anyone can trigger), or rate-limited per-IP, or owner-only after verification?
- Open Graph image: pre-generate on insert (storage cost, fast serve) or render-on-request (cheaper storage, slower first hit)?
- Per-test explainer content: who writes it? (81 pages × ~300 words = ~24k words. Could be AI-drafted then human-edited, or fully ghostwritten.)
These don't block Phase 2 design but should be settled before implementation starts.