Wixie — Agent Contract

Audience: Claude. Wixie engineers prompts — crafts, converges, tests, hardens, translates — via a managed Opus/Sonnet/Haiku network across 64 target models.

Shared behavioral modules

These apply to every skill in every plugin. Load once; do not re-derive.

@../vis/packages/core/conduct/discipline.md — coding conduct: think-first, simplicity, surgical edits, goal-driven loops
@../vis/packages/core/conduct/context.md — attention-budget hygiene, U-curve placement, checkpoint protocol
@../vis/packages/core/conduct/capability-fidelity.md — contracts survive capability gaps: recover, escalate, or abort; never silently substitute
@../vis/packages/core/conduct/verification.md — independent checks, baseline snapshots, dry-run for destructive ops
@../vis/packages/core/conduct/verdict-calibration.md — every verdict (DEPLOY/PASS/COMPLETE/VERIFIED) carries n, sampling method, and a calibration qualifier; vis-side abstraction over the wixie DEPLOY bar
@../vis/packages/core/conduct/doubt-engine.md — adversarial self-check before agreement; counter to F01 sycophancy; fires on user proposals AND your own prior framing
@../vis/packages/core/conduct/delegation.md — subagent contracts, tool whitelisting, parallel vs. serial rules
@../vis/packages/core/conduct/failure-modes.md — 14-code taxonomy for learnings.md so E6 can aggregate
@../vis/packages/core/conduct/tool-use.md — tool-choice hygiene, error payload contract, parallel-dispatch rules
@../vis/packages/skills/conduct/formatting.md — per-target format (XML/Markdown/minimal/few-shot), prefill + stop sequences
@../vis/packages/skills/conduct/skill-authoring.md — SKILL.md frontmatter discipline, discovery test
@../vis/packages/core/conduct/hooks.md — advisory-only hooks, injection over denial, fail-open
@../vis/packages/core/conduct/metacognition.md — periodic goal-restate; fires every K=8 tool-uses or on user meta-question
@../vis/packages/core/conduct/precedent.md — log self-observed failures to state/precedent-log.md; consult before risky steps
@../vis/packages/core/conduct/precedent-freshness.md — verify self-authored memory/precedent/briefings before relying on them: Class-A surfaces (path/function/flag) get a Glob/Grep existence check; Class-B snapshots get a git-log freshness check; Class-C feedback rules are trusted unless contradicted
@../vis/packages/core/conduct/prior-art-discovery.md — F28 counter: run the 5-target discovery pass (shared/scripts, packages/*/skills, state/proposals, slug-glob, signature-grep) before authoring a new tool/script/skill/module
@../vis/packages/core/conduct/reversibility-foresight.md — classify action reversibility (trivial/costly/impossible) before acting; confirmation scales with tier
@../vis/packages/core/conduct/substrate-consumption.md — read-side complement to precedent.md: consume briefing, MEMORY, learnings, and precedent before acting; counter to F24 substrate-blindness
@../vis/packages/core/conduct/sunk-cost-iteration.md — stop-and-re-ask after 2 INCONCLUSIVE/BLOCKED results on the same artifact; iteration is not an authorization to keep patching
@../vis/packages/core/conduct/tier-sizing.md — prompt verbosity scales inversely with model tier; Haiku needs mechanical steps, Opus runs on intent
@../vis/packages/web/conduct/web-fetch.md — external URL handling: cache, dedup, budget; WebFetch is Haiku-tier-only
@../vis/packages/web/conduct/research-pipeline.md — multi-phase web research discipline: 6-phase shape, work-budget floors, mandatory adversarial round, 15-min wall-clock floor
@../vis/packages/web/conduct/source-discipline.md — handling web-fetched evidence: untrusted-source quote wrapping, independence checks, τ, dissemination_score, source-type weighting
@../vis/packages/web/conduct/citation-verification.md — trace + re-fetch protocol; Wayback Machine fallback; 4-class support taxonomy (Supported / Partial / Unsupported / Uncertain); refetch_pass_rate thresholds
@../vis/packages/web/conduct/mcp-research-discipline.md — opt-in MCP integration for research fetchers: per-query MCP routing (Brave / Tavily / Zotero / Playwright), three security gates (manifest audit, version pin, least-privilege creds), monoculture check
@shared/conduct/inference-substrate.md — cross-session evidence accumulation; emit to and read from the inference-engine substrate without corrupting its honest-numbers contract

When a module conflicts with a plugin-local instruction, the plugin wins — but log the override.

Lifecycle

Wixie is skill-invoked, not hook-driven. The single hook is advisory (prompt-save notification). Each skill hands artifacts to the next.

Stage	Skill	Agent tier	Artifact produced
Research	`/deep-research` (also auto-fires inside `/create`)	Opus decomposer/synth + Sonnet triangulator + Haiku fetchers/validator	`state/briefs/<slug>/{claims.json, sources.jsonl, trace.json}`
Craft	`/create`	Opus orchestrator + Haiku reviewer	`prompt.*`, `metadata.json`
Refine	`/refine`	Opus orchestrator + Haiku reviewer	`prompt.*` (v++), `metadata.json`
Converge	`/converge`	Sonnet optimizer + Haiku reviewer	`learnings.md`, updated scores
Test	`/test-prompt`	Sonnet executor	`tests.json`, pass/fail
Harden	`/harden`	Sonnet red-team	`audit.json` (12 attacks)
Translate	`/translate-prompt --to <model>`	Sonnet adapter	`prompt.<new>`, score comparison

Engines

E0 Deep Research (factual ground truth) · E1 Gauss Convergence · E2 Boolean Satisfiability Overlay · E3 Cross-Domain Adaptation · E4 Adversarial Robustness · E5 Static-Dynamic Dual Verification · E6 Gauss Accumulation (self-learning). Derivations: docs/science/README.md.

E0 auto-fires inside /create (and /refine) when the topic depends on external or time-sensitive facts, producing a verified cited brief that the crafter folds into the prompt's <context>. Standalone entry: /deep-research <topic>. Briefs are reused across skills when freshness < 30 days.

DEPLOY bar

Verdict	Criteria	Action
DEPLOY	σ < 0.45 and overall ≥ 9.0 and all 5 axes ≥ 7.0 and 8/8 SAT assertions pass	Ship. Save artifacts.
HOLD	σ ≥ 0.45 or any axis < 7.0	Re-run `/converge`. Do not hand back weak prompts.
FAIL	Reviewer flags registry mismatch / stale technique / format drift	Fix the flagged issue, then retry.

σ = standard deviation of the 5 axis scores from 10. 8 SAT assertions: has_role, has_task, has_format, has_constraints, has_edge_cases, no_hedges, no_filler, has_structure.

Behavioral contracts

IMPORTANT — Check shared/models-registry.json before generating. If the developer picked a model that mismatches the task domain (Claude for pure image gen, Gemini without examples, o-series with long CoT, etc.), warn with a better alternative before spending tokens.
Format follows model. XML for Claude, Markdown + sandwich method for GPT, stripped minimal for o-series, always-few-shot for Gemini. Do not ship a Claude-format prompt to a non-Claude target without /translate-prompt.
YOU MUST respect the no-regression contract. When /converge auto-reverts an iteration, log the failed hypothesis to learnings.md and pick a different axis. Do not override.
YOU MUST NOT inflate scores. The honest-numbers contract is the product. If 7/8 assertions pass, the verdict is HOLD, not DEPLOY — regardless of overall score.
Ask, don't guess. If a metadata field is unknown, ask the developer or run the engine. Never fabricate scores, costs, or technique lists.
ESCALATE on image prompts. DALL-E, Midjourney, SD, Wixie, Nano Banana, etc. are collaborative — wait for the developer's 1–10 rating and visual feedback each round. After 5+ rounds without progress, recommend a different image model.
ESCALATE on unknown target model. If the target model ID is not in shared/models-registry.json, stop and ask. The registry is the capability source of truth.
Offer commit + push after registry or shared-artifact edits. Whenever you edit shared/models-registry.json, a shared/vis/conduct/*.md module, shared/conduct/inference-substrate.md, or anything a downstream plugin reads as source-of-truth, end the turn by asking whether to commit and push — don't wait for the developer to remember. State the change in one line ("registry bumped to N models, last_updated YYYY-MM-DD") and wait for yes before running git.

Artifacts per prompt

prompts/<name>/
├── prompt.<ext>       production prompt, format matches target model
├── metadata.json      model, tokens, cost, 5-axis scores, 8 assertions, version
├── tests.json         regression test cases (≥ 3, ≥ 1 edge-case)
├── report.pdf         dark-themed single-page audit (final only)
└── learnings.md       E6 hypothesis/outcome log — persists across sessions

Folder hygiene. Intermediate HTML / diffs / scratch live in the plugin's state/ dir. Only the final PDF stays in prompts/<name>/. The prompt folder is a handoff surface, not a work-in-progress.

Agent tiers

Tier	Model	Used for
Orchestrator	Opus	Judgment, intent, technique selection (crafter, refiner)
Executor	Sonnet	Convergence loop, adversarial attacks, format conversion, test execution
Validator	Haiku	Quality gate — file completeness, metadata consistency, score freshness

Respect the tiering. Routing a Haiku validation task to Opus burns budget and breaks the cost contract.

Anti-patterns

Claude-to-GPT copy. Dropping a Claude-formatted prompt into a GPT folder. GPT needs sandwich method and different emphasis conventions — run /translate-prompt.
Scratch in prompts/. Leaving HTML, diff, or iteration artifacts in the prompt folder. They belong in state/; only report.pdf ships.
Unverified translation. Handing back a translated prompt without a score comparison. Translation without verification is not translation.
Autonomous image loops. Iterating an image prompt without visual feedback. Text prompts converge on assertions; image prompts converge on developer ratings.
DEPLOY claim with stale metadata. Verdict comes from the current convergence run's scores, not metadata.json from a prior session. Re-run self-eval if unsure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wixie — Agent Contract

Shared behavioral modules

Lifecycle

Engines

DEPLOY bar

Behavioral contracts

Artifacts per prompt

Agent tiers

Anti-patterns

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Wixie — Agent Contract

Shared behavioral modules

Lifecycle

Engines

DEPLOY bar

Behavioral contracts

Artifacts per prompt

Agent tiers

Anti-patterns