Generated: 2026-04-27
Architecture spec: wixie/prompts/mcp-client-golden-architecture/output-opus-4-7.json
| Check | Result |
|---|---|
tsc --noEmit |
clean |
npm test |
136 passing / 7 todo / 0 fail (18 test files) |
| End-to-end integration | 6 tests against a real Node subprocess MCP server |
| Approved-deps-only | yes (zod, @opentelemetry/api, @opentelemetry/sdk-node, undici) |
| Audit verdict | HARDENING_NEEDED → HARDENED (4 findings fixed; 1 critical was a false positive) |
| Architectural issues caught by integration | 3 fixed: lifecycle.<phase> auto-subscribe, ack dedup, qualified-first resolve |
40 TypeScript files, 6267 LOC. From the v0.1 baseline (24 files, 1909 LOC) we added:
- crow (227 LOC) — Beta-Binomial trust posterior + Lanczos log-gamma + asymptotic digamma for closed-form Beta entropy. 8 tests.
- djinn (~210 LOC) — D1 LCS drift detection at anchor + post-session, immutable session anchor store. 5 tests. (D2 HMM deferred to v0.3.)
- emu (~220 LOC) — A2 Linear Runway Forecasting (mean ± 1.96σ CI), A1 read-loop / edit-revert pattern detection over 100-observation sliding window. 6 tests.
- gorgon (~270 LOC) — language-agnostic PageRank (Brin & Page d=0.85, max 50 iter, tolerance 1e-6) with dangling-node mass redistribution; cross-session snapshot + post-write hotspot-changed detection. 6 tests. (Tarjan SCC + Python-AST extraction → v0.3.)
- lich (~240 LOC) — primary owner of failure-mode 2 (tool poisoning). M1 static scan over 5 pattern categories (imperative-override, credential-request, suspicious TLD, base64-payload, hidden-Unicode) + M6 simplified EMA false-positive learning. Required: true — fail-closed. 7 tests.
- naga (~330 LOC) — multi-axis structural fingerprinting (N1 SHA-1 shape + N2 TF-IDF top-20 + N3 naming convention) with Jaccard similarity drift detection. Required: true for FM2/FM10 secondary mitigation. 6 tests.
- pech (~210 LOC) — in-memory ledger + per-vendor budget tracking + tier-boundary threshold detection (HIGH/MED/LOW/CRITICAL) + vendor-exhaustion kill switch. Required: true (always-tier per spec). 10 tests. (File-backed ledger + L1 EMA / L3 Z-score → v0.3.)
- sylph (~240 LOC) — W5 destructive-op gate (6 patterns: force-push, reset-hard, branch-D, rm-rf, force-with-lease, plain-push advisory) at trust-gate (required: true, fail-closed) + W2 boundary segmentation via Jaccard at post-session (advisory). 9 tests.
- streamable-http.ts (~340 LOC) — single-endpoint POST + GET,
Accept: application/json, text/event-stream, undici-based; SSE multi-linedata:parsing; 8MB body cap (failure-mode 5); exponential backoff (initial 500ms, factor 2, max 30s, ±20% jitter, 10 attempts max); resume disabled by default (failure-mode 8) with opt-inallowResume + sessionNoncebinding. 7 tests.
remaining-failure-modes.test.tsrewritten: FM 3 (audience binding), FM 4 (secret masking), FM 10 (schema-digest mismatch) now have real coverage. FM 2/6/8/9 remainit.todopointing at v0.3 surfaces.
The code-review agent flagged 1 critical, 3 high, 5 medium, 5 low.
The remaining variable was misleadingly named (it was elapsed-past-deadline, not time-remaining). The if (remaining > 0) return resolve() correctly fires only after the deadline expires. Renamed to elapsedPastDeadline for clarity. Same logic, no behavior change.
v0.1 only blocked [::1]. v0.2 strips brackets, then checks:
::1loopback::ffff:IPv4-mapped IPv6 (defensive blanket reject — WHATWG URL parser normalizes textual IPv4 to hex pairs, making per-octet checks brittle; no legitimate OAuth-metadata case for v4-mapped IPv6 literals)fe80::/10link-local (regex on lowercase prefix)fc00::/7unique-local
Public IPv6 (e.g., [2606:4700:4700::1111]) still passes. 6 new regression tests in ssrf-oauth-metadata.test.ts.
parseJsonRpc now validates field shapes after the jsonrpc:"2.0" check:
methodMUST be string if presentidMUST be number / string / null if presenterror.codeMUST be number,error.messageMUST be string
12 new regression tests in tests/protocol/jsonrpc-validation.test.ts.
v0.1 stringified the payload; rm -rf / in args: ["rm", "-rf", "/"] slipped through because each arg is a JSON-quoted string. v0.2 also reconstructs the command line by joining string-array args and runs CVE patterns against both corpora. Deduplicates hits. 5 new regression tests in tests/security/hydra-args-array.test.ts.
| # | Finding | Status |
|---|---|---|
| M1 | hydra lastIndex reset before test() |
acknowledged; v0.3 |
| M2 | namespace registry deep-key sort for digest | acknowledged; v0.3 |
| M3 | advisory veto recorded as degraded_findings |
acknowledged; v0.3 |
| M4 | lifecycle contextFromEvent hardcoded sampling_depth/deadline |
tracked in v0.2 follow-ups |
| M5 | hydra HIGH severity warns instead of vetoes | acknowledged; v0.3 (configurable threshold) |
| L1 | buildResourceParameter doesn't invoke SSRF guard |
v0.3 |
| L2 | Bus derived-event recursion has no depth guard | v0.3 |
| L3 | Phase event IDs deterministic not UUID | minor; v0.3 |
| L4 | PEM regex super-linear backtracking risk | v0.3 (bounded quantifier) |
| L5 | pech.required = true not enforced by CI guard |
v0.3 |
A real Node subprocess (tests/fixtures/mock-mcp-server.mjs) speaking JSON-RPC over stdio is driven through the full v0.2 stack via the new McpClient class. 6 integration tests:
| Test | Proves |
|---|---|
| initialize → tools/list → tools/call | Real handshake, JSON-RPC ID correlation, namespace registration, dispatch through orchestrator |
hydra vetoes rm -rf / via args-array |
Trust-gate fail-closed; audit's args-array fix works live; SecurityVetoError propagates |
| schema-drift detection across 2× tools/list | Namespace SHA-256 digest pin catches mutated description (FM10 mitigation) |
| pech ledger appends per call | Vendor budget configured, one call → ledger grows by exactly 1 (verifies ack-dedup) |
| unknown qualified name rejected | ToolNotFoundError |
| bus tap shows all 7 phases | anchor / trust-gate / pre-dispatch / dispatch / post-response / post-session / cross-session |
- Plugins never acked — adapters subscribed to domain topics (
mcp.tool.call.requested) but the orchestrator publisheslifecycle.<phase>. Per ADR-001, plugins should subscribe to phase-named topics. Fixed: orchestrator auto-subscribes each plugin tolifecycle.<phase>for every declared phase. Plugins gate onevent.topicinonPhase. - Pech ledger doubled — auto-subscribe meant the same phase fired the handler twice (one domain topic + one lifecycle topic). Fixed:
AckTracker.has()+ wired-handler dedup. shell.execmis-resolved —query.includes('.')heuristic treated tools-with-dots-in-bare-name as qualified. Fixed:NamespaceRegistry.resolvetriesbyQualifiedfirst, falls back tobyBare.
src/
├── orchestration/{lifecycle,request-context}.ts
├── bus/{event-types,pubsub}.ts
├── transport/{stdio,streamable-http}.ts
├── oauth/{pkce,resource-indicators,metadata-validator}.ts
├── registry/namespace.ts
├── protocol/jsonrpc.ts
├── plugins/
│ ├── plugin-contract.ts
│ ├── hydra.adapter.ts + hydra/cve-patterns.ts
│ ├── crow.adapter.ts (Beta-Binomial)
│ ├── djinn.adapter.ts (LCS drift)
│ ├── emu.adapter.ts (runway forecast)
│ ├── gorgon.adapter.ts (PageRank)
│ ├── lich.adapter.ts (M1 + M6 EMA)
│ ├── naga.adapter.ts (N1+N2+N3 fingerprint)
│ ├── pech.adapter.ts (ledger + thresholds)
│ └── sylph.adapter.ts (W5 + W2)
├── schematic.README.md (non-runtime per spec)
├── client/mcp-client.ts (high-level glue, JSON-RPC correlation)
└── index.ts
tests/
├── orchestration/smoke.test.ts (3)
├── oauth/pkce-roundtrip.test.ts (7)
├── protocol/jsonrpc-validation.test.ts (12) ← v0.2
├── security/
│ ├── tool-name-collision.test.ts (7)
│ ├── unbounded-resources.test.ts (4)
│ ├── ssrf-oauth-metadata.test.ts (18) ← +6 IPv6
│ ├── hydra-args-array.test.ts (5) ← v0.2
│ └── remaining-failure-modes.test.ts (10 real + 7 todo)
├── transport/streamable-http.test.ts (7) ← v0.2
└── plugins/
├── crow.test.ts (8)
├── djinn.test.ts (5)
├── emu.test.ts (6)
├── gorgon.test.ts (6)
├── lich.test.ts (7)
├── naga.test.ts (6)
├── pech.test.ts (10)
└── sylph.test.ts (9)
└── integration/end-to-end.test.ts (6) ← v0.2 — real subprocess
└── fixtures/mock-mcp-server.mjs ← v0.2 — Node JSON-RPC server
Browser dashboard (Vite + Preact UI) — dropped because terminal + VS Code surfaces are in-context where developers work; the dashboard required active browser-tab visiting and got ignored. The WebSocket broadcaster (src/observability/dashboard-server.ts) stays — VS Code's webview consumes it. Notifier stays. Bus, orchestrator, plugins, transports unchanged.
Removed files:
dashboard/— Vite + Preact browser UI (App.tsx, components, lib/ws-client.ts, styles.css, index.html, vite.config.ts, etc.)scripts/run-dashboard.ts— launcher that spawned Vite + the browser UI (217 LOC)
- OAuth replay defense — nonce + freshness store. Currently audience binding is the only check. Surface:
src/oauth/nonce-store.ts. - Server spoofing (failure-mode 6) — TLS cert pinning + Authorization-header response-origin check. Surface:
src/transport/tls-pin.ts. - Full trust-pin (failure-mode 10) — SHA-256 over (command + args + binary digest + env allowlist + URL + per-tool schema digest). Surface:
src/registry/trust-pin.ts. - Lich M5 sandbox — currently M1 static + M6 EMA. M5 requires sandbox runtime (Docker / nsjail /
vmmodule). - Djinn D2 HMM drift — currently D1 LCS only. Adds Baum-Welch labeller for ON_TASK / SIDEQUEST / LOST.
- Pech file-backed ledger + L1 EMA forecast + L3 Z-score anomaly + L4 cache-waste — currently in-memory.
- Gorgon Tarjan SCC + Python-AST extraction — currently language-agnostic PageRank only.
- Bus depth-bounded derived-event recursion — currently unbounded.
- Lifecycle full RequestContext propagation — currently reconstructs minimal context per plugin invocation.
- Hydra HIGH severity → veto — currently warns; configurable severity threshold.
cd client/enchanter
npm install
npm run typecheck
npm testExpected: tsc clean, 130 passing / 7 todo / 0 fail.
- Files produced (count): 40 TypeScript + 5 config + 1 IMPLEMENTATION_SUMMARY.md = 46
- Files implemented vs stubbed: 22 implemented / 1 stub (
schematic.README.mdper spec edge_case) - Tests passing vs todo: 130 passing / 7 todo
- Approved deps used: typescript, vitest, zod, @opentelemetry/api, @opentelemetry/sdk-node, undici
- Author-judgment count: ~30 (each plugin's algorithmic simplifications + threshold choices documented inline)
- Audit findings closed: 3 of 3 high (IPv6 SSRF, JSON-RPC validation, hydra args-array); 1 critical was false positive (variable rename)
- Highest-risk v0.3 follow-up: OAuth replay nonce defense — current implementation has audience binding only.