You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Harden VIS for prompt-leak resilience and runtime agent boundaries.
6
+
7
+
-**New failure code F34 — Untrusted-context injection** (indirect prompt injection): the canonical, host-agnostic code for "content from a data channel obeyed as a principal instruction," generalizing the web-only sub-code F13.1. Full taxonomy doc (`packages/safety/taxonomy/f34-untrusted-context-injection.md`) + runbook (`packages/safety/runbooks/F34.md`), registered in `failure-modes.md` and the taxonomy index.
8
+
-**New recipe `agent-runtime-boundaries.md`**: host-side trust boundaries — untrusted-content wrap, read-only default, host-enforced tool permissions, provenance preservation across multi-agent hand-offs (anti-laundering), with a tool permission matrix and example host manifest. Encodes the principle that the system prompt is not a security boundary; security-critical enforcement lives in the runtime.
9
+
-**New eval artifact `docs/evals/agent-boundary-checklist.md`**: six adversarial pass/fail cases (reveal-system-prompt, indirect instruction in retrieved content, tool call requested by untrusted content, rewritten risky request across agents, state-change without approval, false-completion claim).
10
+
-**enchanter-hooks v0.7 — the runtime-enforcement layer of the above**: four new advisory, fail-open hooks (11 → 15) tied to deterministic Claude Code hook events. `context-taint-scan` (`PostToolUse(Read|Grep|WebFetch)`) flags directive language in retrieved `tool_response` — the runtime half of F34's counter. `delegation-scope-guard` (`SubagentStart`) injects a scope+provenance reminder into a risky subagent's own context (anti-laundering, per `agent-runtime-boundaries.md`). `evidence-gate` (`Stop`) flags unbacked completion claims (the boundary-checklist false-completion case). `dependency-intent-receipt` (`PreToolUse`) asks for supply-chain provenance on dep changes. Plus an obligation-anchor extension to `compact-checkpoint` (approvals / denied approaches / security boundaries / verification debt). New `packages/hooks/tests/verify-hooks.sh` self-test (74 checks).
> A 12-line `paginate()` function has an off-by-one. The PR title says *"fix the off-by-one in pagination."* The agent rewrites it as a `Paginator` class, adds a docstring nobody asked for, renames `perPage` to `n`, and slips the actual one-character fix onto line 4. Type-check passes. Tests pass. The bug is fixed, but the codebase grew a new pattern nobody decided on, and the diff buries the fix under 30 lines of unsolicited refactor (F04 task drift).
23
23
>
@@ -29,7 +29,7 @@ The behavioral substrate for building durable AI agents — conduct, engines, ta
29
29
30
30
**In plain English:** Most agent stacks ship with prompts, tools, and hopes. The thing that actually keeps an agent from refactoring code you didn't ask it to touch, or pushing to main after you said not to, isn't another tool — it's a behavior rule that survives the long context. vis is the dependency-free pile of those rules, plus the math, taxonomy, and host recipes around them.
31
31
32
-
**Technically:** 37 conduct modules across 7 conduct packages (`core` / `skills` / `orchestration` / `safety` / `web` / `memory` / `cost`), plus a `hooks` package shipping 6 runtime advisory hooks (the **enchanter-hooks** plugin, installable via the vis marketplace). 12 algorithmic engines with paper-backed derivations (Aho-Corasick pattern detection, Shannon entropy, Beta-Bernoulli trust scoring, Markov drift, Hunt-Szymanski LCS, Zhang-Shasha tree-edit, Tarjan SCC, Wald SPRT, Jaccard-cosine boundary segmentation, contextual LLM bandit, agentproof DFA, sycophancy calibration). 21 named failure codes (F01–F21) with testable counters, mapped to a 5-axis hybrid taxonomy (memory / reflection / planning / action / system) and 21 incident-response runbooks. 9 adoption recipes (Claude Code, OpenAI Agents SDK, Cursor, LangChain, Pydantic-AI, BAML, raw system-prompt, eval-harnesses, stupid-agent-review). Zero runtime dependencies — pure prose + math, loadable into any system that accepts text instructions.
32
+
**Technically:** 37 conduct modules across 7 conduct packages (`core` / `skills` / `orchestration` / `safety` / `web` / `memory` / `cost`), plus a `hooks` package shipping 15 runtime advisory hooks (the **enchanter-hooks** plugin, installable via the vis marketplace). 12 algorithmic engines with paper-backed derivations (Aho-Corasick pattern detection, Shannon entropy, Beta-Bernoulli trust scoring, Markov drift, Hunt-Szymanski LCS, Zhang-Shasha tree-edit, Tarjan SCC, Wald SPRT, Jaccard-cosine boundary segmentation, contextual LLM bandit, agentproof DFA, sycophancy calibration). 22 named failure codes (F01–F21 + F34, the taxonomy-doc'd set) with testable counters, mapped to a 5-axis hybrid taxonomy (memory / reflection / planning / action / system) and 22 incident-response runbooks. 10 adoption recipes (Claude Code, OpenAI Agents SDK, Cursor, LangChain, Pydantic-AI, BAML, raw system-prompt, eval-harnesses, stupid-agent-review, agent-runtime-boundaries). Zero runtime dependencies — pure prose + math, loadable into any system that accepts text instructions.
For runtime enforcement (not just description), wire hooks per [`packages/skills/recipes/claude-code.md`](packages/skills/recipes/claude-code.md) § Enforcement wiring. The framework now includes copy-paste shell skeletons in [`packages/core/conduct/hooks.md`](packages/core/conduct/hooks.md) § Starter patterns — PreToolUse deny, PostToolUse inject, Stop notify. Or install them ready-made — `/plugin marketplace add enchanter-ai/vis` then `/plugin install enchanter-hooks@vis` — the **enchanter-hooks** plugin ships 6 advisory, fail-open hooks (post-compaction checkpoint, secret scan, config self-edit guard, reversibility guard, debug-hygiene, syntax validation) that activate without editing `settings.json`.
224
+
For runtime enforcement (not just description), wire hooks per [`packages/skills/recipes/claude-code.md`](packages/skills/recipes/claude-code.md) § Enforcement wiring. The framework now includes copy-paste shell skeletons in [`packages/core/conduct/hooks.md`](packages/core/conduct/hooks.md) § Starter patterns — PreToolUse deny, PostToolUse inject, Stop notify. Or install them ready-made — `/plugin marketplace add enchanter-ai/vis` then `/plugin install enchanter-hooks@vis` — the **enchanter-hooks** plugin ships 15 advisory, fail-open hooks (post-compaction checkpoint + obligation anchor, secret scan, config self-edit guard, reversibility guard, debug-hygiene, syntax validation, context-taint scan, dependency-intent receipt, delegation-scope guard, evidence gate, and more) that activate without editing `settings.json`.
222
225
223
226
### OpenAI Agents SDK
224
227
@@ -303,7 +306,7 @@ A conduct module the subagent never sees can't shape its behavior. [`packages/co
303
306
304
307
### A failure taxonomy that compounds
305
308
306
-
Free-text learning notes don't compound. Tagged ones do. The taxonomy ships 21 canonical codes split across two packages — F01–F14 in [`packages/core/taxonomy/`](packages/core/taxonomy/) (generation / action / reasoning) and F15–F21 in [`packages/safety/taxonomy/`](packages/safety/taxonomy/) (multi-agent + alignment). Each code has a precise signature, a testable counter, and an escalation rule:
309
+
Free-text learning notes don't compound. Tagged ones do. The taxonomy ships 22 doc'd codes split across two packages — F01–F14 in [`packages/core/taxonomy/`](packages/core/taxonomy/) (generation / action / reasoning) and F15–F21 + F34 in [`packages/safety/taxonomy/`](packages/safety/taxonomy/) (multi-agent + alignment + trust-boundary). Each code has a precise signature, a testable counter, and an escalation rule:
Tag every entry in your failure log with one code. Now you can aggregate. Now you can learn.
321
324
@@ -325,7 +328,7 @@ A parallel **5-axis layer** lives at [`packages/core/taxonomy/axes.md`](packages
325
328
326
329
### Adoption guides, not just docs
327
330
328
-
Recipes give you the wiring for seven host platforms plus an eval-harness reference. No hand-waving — concrete file paths, concrete config, a verification step you can actually run. Host recipes live in [`packages/skills/recipes/`](packages/skills/recipes/); the eval-harness reference lives in [`packages/cost/recipes/`](packages/cost/recipes/).
331
+
Recipes give you the wiring for seven host platforms plus eval-harness, mechanical-review, and runtime-boundary references. No hand-waving — concrete file paths, concrete config, a verification step you can actually run. Host recipes live in [`packages/skills/recipes/`](packages/skills/recipes/); the eval-harness reference lives in [`packages/cost/recipes/`](packages/cost/recipes/).
329
332
330
333
| Recipe | What it covers |
331
334
|--------|----------------|
@@ -338,6 +341,7 @@ Recipes give you the wiring for seven host platforms plus an eval-harness refere
338
341
|[`system-prompt.md`](packages/skills/recipes/system-prompt.md)| Raw API / llama.cpp / Ollama wiring |
339
342
|[`eval-harnesses.md`](packages/cost/recipes/eval-harnesses.md)| Benchmark suite reference: τ²-bench, AgentDojo, AgentHarm, SYCON-Bench, etc. |
0 commit comments