Audience: Claude. Lich answers the developer's sixth question — "Is this code good?" — via a five-engine review pipeline: static suspicion (M1 Cousot Interval Propagation), structural change comprehension (M2 Falleri Structural Diff), sandboxed confirmation (M5 Bounded Subprocess Dry-Run), Bayesian preference accumulation (M6), and LLM rubric judgment (M7 Zheng Pairwise Rubric). Catches runtime failures that pass compile time; learns per-developer preferences with Bayesian uncertainty; defers security-lane findings to Hydra and change-classification to Crow.
These apply to every skill in every plugin. Load once; do not re-derive.
- @../vis/packages/core/conduct/discipline.md — coding conduct: think-first, simplicity, surgical edits, goal-driven loops
- @../vis/packages/core/conduct/capability-fidelity.md — contracts survive capability gaps: recover, escalate, or abort; never silently substitute
- @../vis/packages/core/conduct/doubt-engine.md — adversarial self-check before agreement; fires on every "yes" and self-claim
- @../vis/packages/core/conduct/context.md — attention-budget hygiene, U-curve placement, checkpoint protocol
- @../vis/packages/core/conduct/verification.md — independent checks, baseline snapshots, dry-run for destructive ops
- @../vis/packages/core/conduct/verdict-calibration.md — every verdict (DEPLOY/PASS/COMPLETE/VERIFIED) carries n, sampling method, and a calibration qualifier; vis-side abstraction over the wixie DEPLOY bar
- @../vis/packages/core/conduct/delegation.md — subagent contracts, tool whitelisting, parallel vs. serial rules
- @../vis/packages/core/conduct/failure-modes.md — 14-code taxonomy for accumulated-learning logs
- @../vis/packages/core/conduct/tool-use.md — tool-choice hygiene, error payload contract, parallel-dispatch rules
- @../vis/packages/skills/conduct/formatting.md — per-target format (XML/Markdown/minimal/few-shot), prefill + stop sequences
- @../vis/packages/skills/conduct/skill-authoring.md — SKILL.md frontmatter discipline, discovery test
- @../vis/packages/core/conduct/hooks.md — advisory-only hooks, injection over denial, fail-open
- @../vis/packages/core/conduct/metacognition.md — periodic goal-restate; fires every K=8 tool-uses or on user meta-question
- @../vis/packages/core/conduct/precedent.md — log self-observed failures to
state/precedent-log.md; consult before risky steps - @../vis/packages/core/conduct/precedent-freshness.md — verify self-authored memory/precedent/briefings before relying on them: Class-A surfaces (path/function/flag) get a Glob/Grep existence check; Class-B snapshots get a git-log freshness check; Class-C feedback rules are trusted unless contradicted
- @../vis/packages/core/conduct/prior-art-discovery.md — F28 counter: run the 5-target discovery pass (shared/scripts, packages/*/skills, state/proposals, slug-glob, signature-grep) before authoring a new tool/script/skill/module
- @../vis/packages/core/conduct/reversibility-foresight.md — classify action reversibility (trivial/costly/impossible) before acting; confirmation scales with tier
- @../vis/packages/core/conduct/substrate-consumption.md — read-side complement to precedent.md: consume briefing, MEMORY, learnings, and precedent before acting; counter to F24 substrate-blindness
- @../vis/packages/core/conduct/sunk-cost-iteration.md — stop-and-re-ask after 2 INCONCLUSIVE/BLOCKED results on the same artifact; iteration is not an authorization to keep patching
- @../vis/packages/core/conduct/tier-sizing.md — prompt verbosity scales inversely with model tier; Haiku needs mechanical steps, Opus runs on intent
- @../vis/packages/web/conduct/web-fetch.md — external URL handling: cache, dedup, budget; WebFetch is Haiku-tier-only
When a module conflicts with a plugin-local instruction, the plugin wins — but log the override.
Lich is hybrid-triggered. PostToolUse hooks auto-review changes on Write/Edit/MultiEdit; skill-invocation is the manual handle for ad-hoc deep reviews.
| Event or Skill | Sub-plugin | Role |
|---|---|---|
| PostToolUse (Write|Edit|MultiEdit) | lich-core |
Run M1 Cousot Interval Propagation + M2 Falleri Structural Diff on touched files |
| PostToolUse (Write|Edit|MultiEdit) | lich-sandbox |
Run M5 Bounded Subprocess Dry-Run on M1-flagged call sites (Unix-only) |
| PostToolUse (Write|Edit|MultiEdit) | lich-preference |
Update M6 Beta posteriors on developer accept/reject signals |
| PostToolUse (Write|Edit|MultiEdit) | lich-rubric |
Run M7 Zheng Pairwise Rubric Judgment with position-swap debiasing |
| PostToolUse (end of PR) | lich-verdict |
Compose M1-M7 outputs into DEPLOY/HOLD/FAIL verdict, emit lich.review.completed |
/lich-review <scope> |
lich-core |
On-demand deep review, aggregating all sub-plugins |
/lich-explain <finding_id> |
lich-rubric |
Walk through why M1/M5/M7 flagged something |
/lich-disable <rule_id> |
lich-preference |
Permanent suppression with auto-reprompt quarterly |
See ./plugins/<name>/hooks/hooks.json for matchers. Agents in ./plugins/<name>/agents/.
M1 Cousot Interval Propagation · M2 Falleri Structural Diff · M5 Bounded Subprocess Dry-Run · M6 Bayesian Preference Accumulation · M7 Zheng Pairwise Rubric Judgment. Derivations in docs/science/README.md. Defining engine: M5 Bounded Subprocess Dry-Run — the static-suspicion → sandboxed-confirmation pipeline is the novel moat no existing reviewer ships at zero-external-dep weight.
| ID | Name | Plugin | Algorithm |
|---|---|---|---|
| M1 | Cousot Interval Propagation | lich-core | Abstract interpretation over interval + nullability + container-shape lattices with threshold widening (N=3) |
| M2 | Falleri Structural Diff | lich-core | GumTree two-phase AST matching (top-down hash + bottom-up Dice containment) |
| M5 | Bounded Subprocess Dry-Run | lich-sandbox | Stdlib resource.setrlimit (CPU/AS/NOFILE/FSIZE) + signal.alarm + subprocess sandbox on Unix |
| M6 | Bayesian Preference Accumulation | lich-preference | Beta-Binomial Thompson sampling per (developer, rule) with 5% minimum surfacing floor |
| M7 | Zheng Pairwise Rubric Judgment | lich-rubric | 5-axis rubric + position-swap debiasing + Cohen's Kappa inter-judge reliability |
Phase 2 adds M3 Yamaguchi Property-Graph Traversal (Joern CPG), M4 Type-Reflected Invariant Synthesis (Hypothesis-ghostwriter upgrade to M5 inputs), Schleimer Winnowing Clone Detection, O'Hearn Separation-Logic Bi-Abduction (Java/C++/ObjC), and Cohort Similarity Borrowing (M6 cold-start).
Markers: [H] hook-enforced (deterministic) · [A] advisory (relies on your adherence).
-
[H] IMPORTANT — Lich never re-scans CWE-tagged security findings. Hydra's R3 OWASP Vulnerability Graph owns CWE taxonomy (98 CWEs across 2,011 patterns). If Hydra's
plugins/vuln-detector/state/audit.jsonlhas a finding on the file, Lich boosts M6 review-attention weight for that file and annotates M7's rubric input with "Security context: Hydra flagged {cwe} {severity}" — but never re-classifies or re-reports the CWE. The non-duplication contract with Hydra is load-bearing; breaking it fractures the severity source of truth across two plugins. -
[H] YOU MUST NOT relax M5 sandbox resource caps. The caps are
RLIMIT_CPU=5s,RLIMIT_AS=512MB,RLIMIT_NOFILE=16,RLIMIT_FSIZE=10MB,signal.alarm=10s. These are load-bearing — relaxing any cap converts the sandbox into an arbitrary-code-execution risk on every PR. Changes require a documented security review. Windows platforms skip M5 entirely and emitplatform-unsupportedin the verdict; never silently pretend M5 ran. -
[A] YOU MUST report Cohen's Kappa alongside M7 scores. The honest-numbers contract is the product. When M7's two judge passes disagree beyond the per-axis threshold (Kappa < 0.4), the axis is flagged "unstable" in the PDF report — never collapsed into a hidden average. A single bare M7 score without its Kappa is a contract violation.
-
[A] YOU MUST respect M6's 5% sampling floor. A rule's Beta posterior can deprioritize but must still surface at ≥ 5% Thompson probability. One rejection does not kill a rule. Permanent suppression is the developer's explicit action via
/lich-disable <rule>— which writes toplugins/lich-preference/state/overrides.jsonwith a dated quarterly re-prompt. Do not drop a rule to 0% through accumulated rejections alone. -
[A] ESCALATE on confirmed runtime failure. When M5 confirms a bug with a concrete witness input (e.g.,
n=0for a div-zero site), the verdict is FAIL, not HOLD. Confirmed bugs are facts, not probabilities — they do not get averaged with rubric scores. -
[A] Consume Crow's V1 + V2 output, never re-classify. Crow's
plugins/change-tracker/state/audit.jsonlis the authoritative change classifier. Lich reviews what Crow flagged; Lich does not peer-classify. If Crow scored a changetrust=0.62, that weight propagates into M6's attention prior unchanged.
| Verdict | M1 condition | M5 condition | M6 condition | M7 condition | Action |
|---|---|---|---|---|---|
| DEPLOY | All flagged sites severity < HIGH | No confirmed runtime failure | ≥ 80% surfaced findings posterior mean > 0.5 | All 5 axes ≥ 3.5/5 AND Kappa ≥ 0.4 | Silent pass; write to state/verdict.jsonl |
| HOLD | 1-2 HIGH flags, no CRITICAL | Any timeout-without-confirmation | ≥ 50% surfaced posterior > 0.3 | Any axis < 3.5 OR Kappa < 0.4 | Surface to reviewer; Sylph warns |
| FAIL | Any CRITICAL OR ≥ 3 HIGH flags | Any confirmed runtime failure | (posterior cannot downgrade from HOLD to FAIL) | Any axis ≤ 2 OR > 2 axes < 3 | Block; Sylph refuses auto-commit |
| State file | Owner | Purpose |
|---|---|---|
plugins/lich-core/state/learnings.json |
lich-core | Per-session learnings for Gauss Accumulation (M1/M2 parameter tuning) |
plugins/lich-core/state/precedent-log.md |
lich-core | Self-observed operational failures (see @../vis/packages/core/conduct/precedent.md) |
plugins/lich-sandbox/state/run-log.jsonl |
lich-sandbox | M5 sandbox run history (CPU used, timeouts, confirmed bugs) |
plugins/lich-preference/state/learnings.json |
lich-preference | Per-(developer, rule) Beta(α, β) posteriors |
plugins/lich-preference/state/overrides.json |
lich-preference | Developer-disabled rules with quarterly re-prompt dates |
plugins/lich-rubric/config/rubric-v1.json |
lich-rubric | Versioned M7 rubric definition (ship-time config) |
plugins/lich-rubric/state/kappa-log.jsonl |
lich-rubric | Per-axis Cohen's Kappa history (runtime, gitignored) |
plugins/lich-verdict/state/verdict.jsonl |
lich-verdict | Append-only DEPLOY/HOLD/FAIL record per review |
shared/learnings.json |
exporter | Cross-plugin aggregated learnings |
| Tier | Model | Used for |
|---|---|---|
| Orchestrator | Opus | M7 disagreement adjudication; cross-engine verdict synthesis when engines conflict |
| Executor | Sonnet | M7 default judge; lich-core analyzer loops; M2 structural diff when LOC > 1k |
| Validator | Haiku | M7 budget fallback (when Pech fires pech.budget.threshold.crossed); rubric-schema freshness; M6 posterior integrity audit |
Respect the tiering. Routing a Haiku validation task to Opus burns budget and breaks the cost contract with Pech.
- Duplicating Hydra's R3. Re-scanning for CWE-89 / CWE-79 / CWE-918 or any other CWE-tagged security finding in M1. Counter: M1 is scoped to correctness (div-zero, null deref, OOB, overflow, resource balance). Security taint sinks stay Hydra's lane.
- Silent M5 skip on Windows. Emitting a green verdict when M5 didn't run because
resourceis absent. Counter: honest noteplatform-unsupportedin the verdict; M1-only runtime-failure judgment with reduced confidence. - Bare M7 score without Kappa. Averaging two judge runs silently when they disagree. Counter: report Kappa per axis; flag axes with Kappa < 0.4 as "unstable" in PDF.
- Rule death from accumulated rejections. Dropping a rule's surface probability to 0 without the developer's explicit
/lich-disable. Counter: Thompson sampling with 5% floor. - Unbounded M5 sandbox. Running developer code without
resource.setrlimitcaps. Counter: the five caps (CPU/AS/NOFILE/FSIZE + signal.alarm) are load-bearing; relaxation requires documented security review. - Peer-classifying changes that Crow already classified. Re-running semantic diff in Lich when Crow's V1 is authoritative. Counter: consume Crow's output, propagate its trust score into M6 priors.
- M7 self-preference in the judge model family. Using the same model family to judge its own output. Counter: when the target of review was generated by Claude (usually the case in Claude Code sessions), surface this in the PDF as a known bias vector, not hidden.
- Zero external runtime deps. Hooks: bash + jq only. Scripts: Python 3.8+ stdlib only. No npm/pip/cargo at runtime (dev-only renderer deps in
docs/assets/package.jsonare the one exception, and are never imported from plugin code). - Managed agent tiers. Opus = orchestrator/judgment. Sonnet = executor/loops. Haiku = validator/format.
- Named formal algorithm per engine. ID prefix letter + number. Academic-style name:
[Method] [Domain] [Action]. - Emu-style marketplace. Each sub-plugin ships
.claude-plugin/plugin.json+{agents,commands,hooks,skills,state}/+README.md. - Dark-themed PDF report. Produced by
docs/architecture/generate.py+puppeteer.config.jsonon final release. - Gauss Accumulation learning. Per-session learnings at
plugins/<name>/state/learnings.json; exported toshared/learnings.json. - enchanted-mcp event bus. Inter-plugin coordination via published/subscribed events namespaced
lich.<event>. - Diagrams from source of truth.
docs/architecture/generate.pyreadsplugin.json+hooks.json+SKILL.mdfrontmatter → writes four mermaid diagrams +index.html. Diagrams are never hand-edited.
Events this plugin publishes: lich.review.completed, lich.rule.disabled, lich.sandbox.failed
Events this plugin subscribes to: crow.change.classified, hydra.vuln.detected, pech.budget.threshold.crossed, emu.runway.threshold.crossed