Skip to content

Commit 209ed58

Browse files
committed
feat: prompt-leak resilience + runtime boundaries (F34) + enchanter-hooks v0.7
Hardens VIS against untrusted-context / prompt-leak failures and ships the runtime-enforcement layer for them. Conduct & taxonomy: - F34 — Untrusted-context injection (indirect prompt injection): host-agnostic taxonomy doc + runbook, registered in failure-modes.md and the taxonomy index. - recipe agent-runtime-boundaries.md (host trust boundaries, anti-laundering). - eval docs/evals/agent-boundary-checklist.md (6 adversarial cases). - web/conduct resilience set + core context-budget / memory-discipline modules. enchanter-hooks v0.7 (11 -> 15) — the runtime half, all advisory/fail-open/quiet: - context-taint-scan PostToolUse(Read|Grep|WebFetch): directive language in retrieved tool_response (F34 runtime counter). - delegation-scope-guard SubagentStart: scope+provenance reminder injected into a risky subagent's own context (anti-laundering). - evidence-gate Stop: flags unbacked completion/verification claims (no loop; stderr+exit 0, never blocks). - dependency-intent-receipt PreToolUse(Bash + Write|Edit): supply-chain provenance. - compact-checkpoint extended into an obligation anchor (approvals / denied approaches / security boundaries / verification debt). - tests/verify-hooks.sh self-test (74 checks: fail-open, quiet-on-benign, one advisory per trigger, valid JSON, LF-only, no network calls). README / marketplace / plugin counts updated (22 codes, 10 recipes, 15 hooks).
1 parent b6a5fd1 commit 209ed58

26 files changed

Lines changed: 1741 additions & 32 deletions
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
"@enchanter-ai/vis-meta": minor
3+
---
4+
5+
Harden VIS for prompt-leak resilience and runtime agent boundaries.
6+
7+
- **New failure code F34 — Untrusted-context injection** (indirect prompt injection): the canonical, host-agnostic code for "content from a data channel obeyed as a principal instruction," generalizing the web-only sub-code F13.1. Full taxonomy doc (`packages/safety/taxonomy/f34-untrusted-context-injection.md`) + runbook (`packages/safety/runbooks/F34.md`), registered in `failure-modes.md` and the taxonomy index.
8+
- **New recipe `agent-runtime-boundaries.md`**: host-side trust boundaries — untrusted-content wrap, read-only default, host-enforced tool permissions, provenance preservation across multi-agent hand-offs (anti-laundering), with a tool permission matrix and example host manifest. Encodes the principle that the system prompt is not a security boundary; security-critical enforcement lives in the runtime.
9+
- **New eval artifact `docs/evals/agent-boundary-checklist.md`**: six adversarial pass/fail cases (reveal-system-prompt, indirect instruction in retrieved content, tool call requested by untrusted content, rewritten risky request across agents, state-change without approval, false-completion claim).
10+
- **enchanter-hooks v0.7 — the runtime-enforcement layer of the above**: four new advisory, fail-open hooks (11 → 15) tied to deterministic Claude Code hook events. `context-taint-scan` (`PostToolUse(Read|Grep|WebFetch)`) flags directive language in retrieved `tool_response` — the runtime half of F34's counter. `delegation-scope-guard` (`SubagentStart`) injects a scope+provenance reminder into a risky subagent's own context (anti-laundering, per `agent-runtime-boundaries.md`). `evidence-gate` (`Stop`) flags unbacked completion claims (the boundary-checklist false-completion case). `dependency-intent-receipt` (`PreToolUse`) asks for supply-chain provenance on dep changes. Plus an obligation-anchor extension to `compact-checkpoint` (approvals / denied approaches / security boundaries / verification debt). New `packages/hooks/tests/verify-hooks.sh` self-test (74 checks).
11+
- README counts and pointers updated (22 taxonomy-doc'd codes, 10 recipes, 22 runbooks, 15 advisory hooks).

.claude-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,6 @@
1414
{ "name": "enchanter-memory", "source": "memory", "version": "0.6.0", "description": "Cross-session memory hygiene: working memory, decay, recall verification." },
1515
{ "name": "enchanter-cost", "source": "cost", "version": "0.6.0", "description": "Session economics: cost-accounting, latency-budgeting, eval-harnesses recipe." },
1616
{ "name": "enchanter-safety", "source": "safety", "version": "0.6.0", "description": "Safety + compliance + security + operator-wiring: refusal-and-recovery, F15-F21 taxonomy, FedRAMP/ISO/SOC2/NIST evidence, pentest, synthetic-fire, OTLP/PagerDuty/Sentry wiring." },
17-
{ "name": "enchanter-hooks", "source": "hooks", "version": "0.6.0", "description": "Advisory Claude Code hooks (fail-open, deterministic, quiet) that enforce the conduct substrate at lifecycle events: post-compaction checkpoint (F03), pre-write secret scan, post-edit debug-artifact hygiene. Install activates hooks natively — no settings.json editing." }
17+
{ "name": "enchanter-hooks", "source": "hooks", "version": "0.7.0", "description": "Advisory Claude Code hooks (fail-open, deterministic, quiet) that enforce the conduct substrate at lifecycle events. v0.7 = 15 hooks across SessionStart / PreToolUse / PostToolUse / SubagentStart / Stop: post-compaction checkpoint + obligation anchor (F03), secret scan, config/substrate/authorship/append-only/reversibility guards, debug + syntax + path hygiene, plus context-taint-scan (F34 indirect prompt injection), dependency-intent-receipt (supply-chain), delegation-scope-guard (multi-agent laundering), and evidence-gate (false completion). Install activates hooks natively — no settings.json editing." }
1818
]
1919
}

README.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@
99
<img alt="7 packages" src="https://img.shields.io/badge/Packages-7-bc8cff?style=for-the-badge">
1010
<img alt="37 conduct modules" src="https://img.shields.io/badge/Modules-37-58a6ff?style=for-the-badge">
1111
<img alt="12 engines" src="https://img.shields.io/badge/Engines-12-d29922?style=for-the-badge">
12-
<img alt="21 failure codes" src="https://img.shields.io/badge/F--codes-F01%E2%80%93F21-f0883e?style=for-the-badge">
12+
<img alt="22 failure codes" src="https://img.shields.io/badge/F--codes-F01%E2%80%93F21%2C+F34-f0883e?style=for-the-badge">
1313
<a href="https://www.repostatus.org/#active"><img alt="Project Status: Active" src="https://www.repostatus.org/badges/latest/active.svg"></a>
1414
</p>
1515

1616
> **An @enchanter-ai product — dependency-free, model-agnostic, dogfooded across the ecosystem.**
1717
1818
The behavioral substrate for building durable AI agents — conduct, engines, taxonomy, and the math behind all three.
1919

20-
**37 conduct modules. 12 engines. 21 failure codes. 9 recipes. Zero runtime dependencies.**
20+
**37 conduct modules. 12 engines. 22 failure codes. 10 recipes. Zero runtime dependencies.**
2121

2222
> A 12-line `paginate()` function has an off-by-one. The PR title says *"fix the off-by-one in pagination."* The agent rewrites it as a `Paginator` class, adds a docstring nobody asked for, renames `perPage` to `n`, and slips the actual one-character fix onto line 4. Type-check passes. Tests pass. The bug is fixed, but the codebase grew a new pattern nobody decided on, and the diff buries the fix under 30 lines of unsolicited refactor (F04 task drift).
2323
>
@@ -29,7 +29,7 @@ The behavioral substrate for building durable AI agents — conduct, engines, ta
2929

3030
**In plain English:** Most agent stacks ship with prompts, tools, and hopes. The thing that actually keeps an agent from refactoring code you didn't ask it to touch, or pushing to main after you said not to, isn't another tool — it's a behavior rule that survives the long context. vis is the dependency-free pile of those rules, plus the math, taxonomy, and host recipes around them.
3131

32-
**Technically:** 37 conduct modules across 7 conduct packages (`core` / `skills` / `orchestration` / `safety` / `web` / `memory` / `cost`), plus a `hooks` package shipping 6 runtime advisory hooks (the **enchanter-hooks** plugin, installable via the vis marketplace). 12 algorithmic engines with paper-backed derivations (Aho-Corasick pattern detection, Shannon entropy, Beta-Bernoulli trust scoring, Markov drift, Hunt-Szymanski LCS, Zhang-Shasha tree-edit, Tarjan SCC, Wald SPRT, Jaccard-cosine boundary segmentation, contextual LLM bandit, agentproof DFA, sycophancy calibration). 21 named failure codes (F01–F21) with testable counters, mapped to a 5-axis hybrid taxonomy (memory / reflection / planning / action / system) and 21 incident-response runbooks. 9 adoption recipes (Claude Code, OpenAI Agents SDK, Cursor, LangChain, Pydantic-AI, BAML, raw system-prompt, eval-harnesses, stupid-agent-review). Zero runtime dependencies — pure prose + math, loadable into any system that accepts text instructions.
32+
**Technically:** 37 conduct modules across 7 conduct packages (`core` / `skills` / `orchestration` / `safety` / `web` / `memory` / `cost`), plus a `hooks` package shipping 15 runtime advisory hooks (the **enchanter-hooks** plugin, installable via the vis marketplace). 12 algorithmic engines with paper-backed derivations (Aho-Corasick pattern detection, Shannon entropy, Beta-Bernoulli trust scoring, Markov drift, Hunt-Szymanski LCS, Zhang-Shasha tree-edit, Tarjan SCC, Wald SPRT, Jaccard-cosine boundary segmentation, contextual LLM bandit, agentproof DFA, sycophancy calibration). 22 named failure codes (F01–F21 + F34, the taxonomy-doc'd set) with testable counters, mapped to a 5-axis hybrid taxonomy (memory / reflection / planning / action / system) and 22 incident-response runbooks. 10 adoption recipes (Claude Code, OpenAI Agents SDK, Cursor, LangChain, Pydantic-AI, BAML, raw system-prompt, eval-harnesses, stupid-agent-review, agent-runtime-boundaries). Zero runtime dependencies — pure prose + math, loadable into any system that accepts text instructions.
3333

3434
## Origin
3535

@@ -107,9 +107,9 @@ vis/
107107
│ │ └── CLAUDE.md ← repo-level instructions for agents editing core
108108
│ ├── skills/ ← author-facing skill conduct + host recipes
109109
│ │ ├── conduct/ ← formatting.md, skill-authoring.md
110-
│ │ └── recipes/ ← 8 adoption recipes (claude-code, openai-agents, cursor,
110+
│ │ └── recipes/ ← 9 adoption recipes (claude-code, openai-agents, cursor,
111111
│ │ langchain, pydantic-ai, baml, system-prompt,
112-
│ │ stupid-agent-review)
112+
│ │ stupid-agent-review, agent-runtime-boundaries)
113113
│ ├── orchestration/ ← multi-agent + engines
114114
│ │ ├── conduct/ ← eval-driven-self-improvement, multi-turn-negotiation,
115115
│ │ │ task-decomposition, inference-substrate
@@ -121,8 +121,8 @@ vis/
121121
│ │ └── templates/ ← bootstrap.sh / .ps1, sessionstart hook, vis-verify.yml
122122
│ ├── safety/ ← multi-agent + alignment cluster, compliance & operator wiring
123123
│ │ ├── conduct/ ← refusal-and-recovery.md
124-
│ │ ├── taxonomy/ ← F15–F21 (multi-agent + alignment)
125-
│ │ ├── runbooks/ ← F15–F21 incident-response runbooks
124+
│ │ ├── taxonomy/ ← F15–F21 (multi-agent + alignment) + F34 (untrusted-context injection)
125+
│ │ ├── runbooks/ ← F15–F21 + F34 incident-response runbooks
126126
│ │ ├── compliance/ ← SOC 2, ISO 42001, FedRAMP, NIST AI RMF readiness
127127
│ │ ├── security/ ← pentest + synthetic-fire artifacts
128128
│ │ └── operator-wiring-2026-05/ ← Day-1 Datadog / Sentry / PagerDuty / Slack / Splunk wiring
@@ -136,9 +136,11 @@ vis/
136136
│ │ ├── conduct/ ← cost-accounting.md, latency-budgeting.md
137137
│ │ └── recipes/ ← eval-harnesses.md
138138
│ └── hooks/ ← runtime advisory hooks (the enchanter-hooks plugin)
139-
│ ├── hooks/hooks.json ← 6 hooks: SessionStart(compact) / PreToolUse / PostToolUse
140-
│ ├── scripts/ ← compact-checkpoint, secret-scan, config-self-edit-guard,
141-
│ │ reversibility-guard, debug-hygiene, post-write-validate
139+
│ ├── hooks/hooks.json ← 15 hooks: SessionStart(compact) / PreToolUse / PostToolUse / SubagentStart / Stop
140+
│ ├── scripts/ ← compact-checkpoint, secret-scan, config-self-edit-guard, reversibility-guard,
141+
│ │ debug-hygiene, post-write-validate, …, context-taint-scan, dependency-intent-receipt,
142+
│ │ delegation-scope-guard, evidence-gate (v0.7)
143+
│ ├── tests/ ← verify-hooks.sh (package self-test: 74 checks)
142144
│ └── .claude-plugin/ ← plugin.json (installable via the vis marketplace)
143145
├── docs/ ← cross-cutting docs: architecture overview, ADRs
144146
│ (0001 four-layers, 0002 taxonomy expansion),
@@ -148,7 +150,7 @@ vis/
148150
└── package.json ← changesets meta-package for cross-repo versioning
149151
```
150152

151-
Counts as of the latest tag: **37 conduct modules** across core / skills / orchestration / safety / web / memory / cost · **12 engines** in `orchestration/engines/` · **21 failure codes** split F01–F14 (core) and F15–F21 (safety) · **21 runbooks** mirroring the F-codes · **9 recipes** (8 in `skills/recipes/` + `cost/recipes/eval-harnesses.md`) · **6 runtime advisory hooks** in `hooks/` (the **enchanter-hooks** plugin).
153+
Counts as of the latest tag: **37 conduct modules** across core / skills / orchestration / safety / web / memory / cost · **12 engines** in `orchestration/engines/` · **22 taxonomy-doc'd failure codes** split F01–F14 (core) and F15–F21 + F34 (safety) · **22 runbooks** mirroring those codes · **10 recipes** (9 in `skills/recipes/` + `cost/recipes/eval-harnesses.md`) · **15 runtime advisory hooks** in `hooks/` (the **enchanter-hooks** plugin).
152154

153155
---
154156

@@ -192,12 +194,13 @@ Don't load everything. Start with the failure mode you're seeing, pull only the
192194
| Working memory degrading across turns | `context.md` + `memory-hygiene.md` |
193195
| Subagent doesn't inherit conduct | `delegation.md` (Conduct propagation) |
194196
| Need runtime gates, not just rules | `hooks.md` (Starter patterns) + `packages/skills/recipes/claude-code.md` |
197+
| Agent obeys instructions hidden in files / tool output / retrieved docs | `packages/safety/taxonomy/f34-untrusted-context-injection.md` + `packages/skills/recipes/agent-runtime-boundaries.md` |
195198
| Latency unpredictable in long workflows | `latency-budgeting.md` |
196199
| Agent refuses benign requests / over-refuses | `refusal-and-recovery.md` |
197200
| Want to learn from observed failures | `eval-driven-self-improvement.md` + `precedent.md` |
198201
| User pressures across turns until you flip | `multi-turn-negotiation.md` + `doubt-engine.md` |
199202
| Doubt-engine F01-counter prose isn't measurable | `packages/orchestration/engines/calibration.md` |
200-
| Failure happened — need incident steps | `packages/core/runbooks/F<NN>.md` (F01–F14) or `packages/safety/runbooks/F<NN>.md` (F15–F21) |
203+
| Failure happened — need incident steps | `packages/core/runbooks/F<NN>.md` (F01–F14) or `packages/safety/runbooks/F<NN>.md` (F15–F21, F34) |
201204
| Want to A/B-validate a module's impact | `packages/orchestration/docs/self-test.md` |
202205
| Evaluating agent conduct | `packages/cost/recipes/eval-harnesses.md` |
203206

@@ -218,7 +221,7 @@ In your project's `CLAUDE.md`:
218221
- @shared/vis/packages/core/conduct/failure-modes.md
219222
```
220223

221-
For runtime enforcement (not just description), wire hooks per [`packages/skills/recipes/claude-code.md`](packages/skills/recipes/claude-code.md) § Enforcement wiring. The framework now includes copy-paste shell skeletons in [`packages/core/conduct/hooks.md`](packages/core/conduct/hooks.md) § Starter patterns — PreToolUse deny, PostToolUse inject, Stop notify. Or install them ready-made — `/plugin marketplace add enchanter-ai/vis` then `/plugin install enchanter-hooks@vis` — the **enchanter-hooks** plugin ships 6 advisory, fail-open hooks (post-compaction checkpoint, secret scan, config self-edit guard, reversibility guard, debug-hygiene, syntax validation) that activate without editing `settings.json`.
224+
For runtime enforcement (not just description), wire hooks per [`packages/skills/recipes/claude-code.md`](packages/skills/recipes/claude-code.md) § Enforcement wiring. The framework now includes copy-paste shell skeletons in [`packages/core/conduct/hooks.md`](packages/core/conduct/hooks.md) § Starter patterns — PreToolUse deny, PostToolUse inject, Stop notify. Or install them ready-made — `/plugin marketplace add enchanter-ai/vis` then `/plugin install enchanter-hooks@vis` — the **enchanter-hooks** plugin ships 15 advisory, fail-open hooks (post-compaction checkpoint + obligation anchor, secret scan, config self-edit guard, reversibility guard, debug-hygiene, syntax validation, context-taint scan, dependency-intent receipt, delegation-scope guard, evidence gate, and more) that activate without editing `settings.json`.
222225

223226
### OpenAI Agents SDK
224227

@@ -303,7 +306,7 @@ A conduct module the subagent never sees can't shape its behavior. [`packages/co
303306

304307
### A failure taxonomy that compounds
305308

306-
Free-text learning notes don't compound. Tagged ones do. The taxonomy ships 21 canonical codes split across two packages — F01–F14 in [`packages/core/taxonomy/`](packages/core/taxonomy/) (generation / action / reasoning) and F15–F21 in [`packages/safety/taxonomy/`](packages/safety/taxonomy/) (multi-agent + alignment). Each code has a precise signature, a testable counter, and an escalation rule:
309+
Free-text learning notes don't compound. Tagged ones do. The taxonomy ships 22 doc'd codes split across two packages — F01–F14 in [`packages/core/taxonomy/`](packages/core/taxonomy/) (generation / action / reasoning) and F15–F21 + F34 in [`packages/safety/taxonomy/`](packages/safety/taxonomy/) (multi-agent + alignment + trust-boundary). Each code has a precise signature, a testable counter, and an escalation rule:
307310

308311
**Generation failures**`packages/core/taxonomy/`
309312
- F01 Sycophancy · F02 Fabrication · F03 Context decay · F04 Task drift · F05 Instruction attenuation
@@ -315,7 +318,7 @@ Free-text learning notes don't compound. Tagged ones do. The taxonomy ships 21 c
315318
- F11 Reward hacking · F12 Degeneration loop · F13 Distractor pollution · F14 Version drift
316319

317320
**Multi-agent and alignment failures**`packages/safety/taxonomy/`
318-
- F15 Inter-agent misalignment · F16 Task-verification skip · F17 System-design brittleness · F18 Goal-conflict insider behavior · F19 Alignment faking *(awareness)* · F20 Sandbagging *(awareness)* · F21 Weaponized tool use
321+
- F15 Inter-agent misalignment · F16 Task-verification skip · F17 System-design brittleness · F18 Goal-conflict insider behavior · F19 Alignment faking *(awareness)* · F20 Sandbagging *(awareness)* · F21 Weaponized tool use · F34 Untrusted-context injection *(indirect prompt injection)*
319322

320323
Tag every entry in your failure log with one code. Now you can aggregate. Now you can learn.
321324

@@ -325,7 +328,7 @@ A parallel **5-axis layer** lives at [`packages/core/taxonomy/axes.md`](packages
325328

326329
### Adoption guides, not just docs
327330

328-
Recipes give you the wiring for seven host platforms plus an eval-harness reference. No hand-waving — concrete file paths, concrete config, a verification step you can actually run. Host recipes live in [`packages/skills/recipes/`](packages/skills/recipes/); the eval-harness reference lives in [`packages/cost/recipes/`](packages/cost/recipes/).
331+
Recipes give you the wiring for seven host platforms plus eval-harness, mechanical-review, and runtime-boundary references. No hand-waving — concrete file paths, concrete config, a verification step you can actually run. Host recipes live in [`packages/skills/recipes/`](packages/skills/recipes/); the eval-harness reference lives in [`packages/cost/recipes/`](packages/cost/recipes/).
329332

330333
| Recipe | What it covers |
331334
|--------|----------------|
@@ -338,6 +341,7 @@ Recipes give you the wiring for seven host platforms plus an eval-harness refere
338341
| [`system-prompt.md`](packages/skills/recipes/system-prompt.md) | Raw API / llama.cpp / Ollama wiring |
339342
| [`eval-harnesses.md`](packages/cost/recipes/eval-harnesses.md) | Benchmark suite reference: τ²-bench, AgentDojo, AgentHarm, SYCON-Bench, etc. |
340343
| [`stupid-agent-review.md`](packages/skills/recipes/stupid-agent-review.md) | Cheap-tier mechanical verifier auditing higher-tier output; the runtime behind A/B rule-efficacy testing |
344+
| [`agent-runtime-boundaries.md`](packages/skills/recipes/agent-runtime-boundaries.md) | Host-side trust boundaries: untrusted-content wrap, read-only default, host-enforced tool permissions, provenance across hand-offs — with a [boundary checklist](docs/evals/agent-boundary-checklist.md) |
341345

342346
---
343347

0 commit comments

Comments
 (0)