Audience: Claude. Wixie engineers prompts — crafts, converges, tests, hardens, translates — via a managed Opus/Sonnet/Haiku network across 64 target models.
These apply to every skill in every plugin. Load once; do not re-derive.
- @../vis/packages/core/conduct/discipline.md — coding conduct: think-first, simplicity, surgical edits, goal-driven loops
- @../vis/packages/core/conduct/context.md — attention-budget hygiene, U-curve placement, checkpoint protocol
- @../vis/packages/core/conduct/capability-fidelity.md — contracts survive capability gaps: recover, escalate, or abort; never silently substitute
- @../vis/packages/core/conduct/verification.md — independent checks, baseline snapshots, dry-run for destructive ops
- @../vis/packages/core/conduct/verdict-calibration.md — every verdict (DEPLOY/PASS/COMPLETE/VERIFIED) carries n, sampling method, and a calibration qualifier; vis-side abstraction over the wixie DEPLOY bar
- @../vis/packages/core/conduct/doubt-engine.md — adversarial self-check before agreement; counter to F01 sycophancy; fires on user proposals AND your own prior framing
- @../vis/packages/core/conduct/delegation.md — subagent contracts, tool whitelisting, parallel vs. serial rules
- @../vis/packages/core/conduct/failure-modes.md — 14-code taxonomy for
learnings.mdso E6 can aggregate - @../vis/packages/core/conduct/tool-use.md — tool-choice hygiene, error payload contract, parallel-dispatch rules
- @../vis/packages/skills/conduct/formatting.md — per-target format (XML/Markdown/minimal/few-shot), prefill + stop sequences
- @../vis/packages/skills/conduct/skill-authoring.md — SKILL.md frontmatter discipline, discovery test
- @../vis/packages/core/conduct/hooks.md — advisory-only hooks, injection over denial, fail-open
- @../vis/packages/core/conduct/metacognition.md — periodic goal-restate; fires every K=8 tool-uses or on user meta-question
- @../vis/packages/core/conduct/precedent.md — log self-observed failures to
state/precedent-log.md; consult before risky steps - @../vis/packages/core/conduct/precedent-freshness.md — verify self-authored memory/precedent/briefings before relying on them: Class-A surfaces (path/function/flag) get a Glob/Grep existence check; Class-B snapshots get a git-log freshness check; Class-C feedback rules are trusted unless contradicted
- @../vis/packages/core/conduct/prior-art-discovery.md — F28 counter: run the 5-target discovery pass (shared/scripts, packages/*/skills, state/proposals, slug-glob, signature-grep) before authoring a new tool/script/skill/module
- @../vis/packages/core/conduct/reversibility-foresight.md — classify action reversibility (trivial/costly/impossible) before acting; confirmation scales with tier
- @../vis/packages/core/conduct/substrate-consumption.md — read-side complement to precedent.md: consume briefing, MEMORY, learnings, and precedent before acting; counter to F24 substrate-blindness
- @../vis/packages/core/conduct/sunk-cost-iteration.md — stop-and-re-ask after 2 INCONCLUSIVE/BLOCKED results on the same artifact; iteration is not an authorization to keep patching
- @../vis/packages/core/conduct/tier-sizing.md — prompt verbosity scales inversely with model tier; Haiku needs mechanical steps, Opus runs on intent
- @../vis/packages/web/conduct/web-fetch.md — external URL handling: cache, dedup, budget; WebFetch is Haiku-tier-only
- @../vis/packages/web/conduct/research-pipeline.md — multi-phase web research discipline: 6-phase shape, work-budget floors, mandatory adversarial round, 15-min wall-clock floor
- @../vis/packages/web/conduct/source-discipline.md — handling web-fetched evidence: untrusted-source quote wrapping, independence checks, τ, dissemination_score, source-type weighting
- @../vis/packages/web/conduct/citation-verification.md — trace + re-fetch protocol; Wayback Machine fallback; 4-class support taxonomy (Supported / Partial / Unsupported / Uncertain); refetch_pass_rate thresholds
- @../vis/packages/web/conduct/mcp-research-discipline.md — opt-in MCP integration for research fetchers: per-query MCP routing (Brave / Tavily / Zotero / Playwright), three security gates (manifest audit, version pin, least-privilege creds), monoculture check
- @shared/conduct/inference-substrate.md — cross-session evidence accumulation; emit to and read from the inference-engine substrate without corrupting its honest-numbers contract
When a module conflicts with a plugin-local instruction, the plugin wins — but log the override.
Wixie is skill-invoked, not hook-driven. The single hook is advisory (prompt-save notification). Each skill hands artifacts to the next.
| Stage | Skill | Agent tier | Artifact produced |
|---|---|---|---|
| Research | /deep-research (also auto-fires inside /create) |
Opus decomposer/synth + Sonnet triangulator + Haiku fetchers/validator | state/briefs/<slug>/{claims.json, sources.jsonl, trace.json} |
| Craft | /create |
Opus orchestrator + Haiku reviewer | prompt.*, metadata.json |
| Refine | /refine |
Opus orchestrator + Haiku reviewer | prompt.* (v++), metadata.json |
| Converge | /converge |
Sonnet optimizer + Haiku reviewer | learnings.md, updated scores |
| Test | /test-prompt |
Sonnet executor | tests.json, pass/fail |
| Harden | /harden |
Sonnet red-team | audit.json (12 attacks) |
| Translate | /translate-prompt --to <model> |
Sonnet adapter | prompt.<new>, score comparison |
E0 Deep Research (factual ground truth) · E1 Gauss Convergence · E2 Boolean Satisfiability Overlay · E3 Cross-Domain Adaptation · E4 Adversarial Robustness · E5 Static-Dynamic Dual Verification · E6 Gauss Accumulation (self-learning). Derivations: docs/science/README.md.
E0 auto-fires inside /create (and /refine) when the topic depends on external or time-sensitive facts, producing a verified cited brief that the crafter folds into the prompt's <context>. Standalone entry: /deep-research <topic>. Briefs are reused across skills when freshness < 30 days.
| Verdict | Criteria | Action |
|---|---|---|
| DEPLOY | σ < 0.45 and overall ≥ 9.0 and all 5 axes ≥ 7.0 and 8/8 SAT assertions pass | Ship. Save artifacts. |
| HOLD | σ ≥ 0.45 or any axis < 7.0 | Re-run /converge. Do not hand back weak prompts. |
| FAIL | Reviewer flags registry mismatch / stale technique / format drift | Fix the flagged issue, then retry. |
σ = standard deviation of the 5 axis scores from 10. 8 SAT assertions: has_role, has_task, has_format, has_constraints, has_edge_cases, no_hedges, no_filler, has_structure.
- IMPORTANT — Check
shared/models-registry.jsonbefore generating. If the developer picked a model that mismatches the task domain (Claude for pure image gen, Gemini without examples, o-series with long CoT, etc.), warn with a better alternative before spending tokens. - Format follows model. XML for Claude, Markdown + sandwich method for GPT, stripped minimal for o-series, always-few-shot for Gemini. Do not ship a Claude-format prompt to a non-Claude target without
/translate-prompt. - YOU MUST respect the no-regression contract. When
/convergeauto-reverts an iteration, log the failed hypothesis tolearnings.mdand pick a different axis. Do not override. - YOU MUST NOT inflate scores. The honest-numbers contract is the product. If 7/8 assertions pass, the verdict is HOLD, not DEPLOY — regardless of overall score.
- Ask, don't guess. If a metadata field is unknown, ask the developer or run the engine. Never fabricate scores, costs, or technique lists.
- ESCALATE on image prompts. DALL-E, Midjourney, SD, Wixie, Nano Banana, etc. are collaborative — wait for the developer's 1–10 rating and visual feedback each round. After 5+ rounds without progress, recommend a different image model.
- ESCALATE on unknown target model. If the target model ID is not in
shared/models-registry.json, stop and ask. The registry is the capability source of truth. - Offer commit + push after registry or shared-artifact edits. Whenever you edit
shared/models-registry.json, ashared/vis/conduct/*.mdmodule,shared/conduct/inference-substrate.md, or anything a downstream plugin reads as source-of-truth, end the turn by asking whether to commit and push — don't wait for the developer to remember. State the change in one line ("registry bumped to N models, last_updated YYYY-MM-DD") and wait for yes before running git.
prompts/<name>/
├── prompt.<ext> production prompt, format matches target model
├── metadata.json model, tokens, cost, 5-axis scores, 8 assertions, version
├── tests.json regression test cases (≥ 3, ≥ 1 edge-case)
├── report.pdf dark-themed single-page audit (final only)
└── learnings.md E6 hypothesis/outcome log — persists across sessions
Folder hygiene. Intermediate HTML / diffs / scratch live in the plugin's state/ dir. Only the final PDF stays in prompts/<name>/. The prompt folder is a handoff surface, not a work-in-progress.
| Tier | Model | Used for |
|---|---|---|
| Orchestrator | Opus | Judgment, intent, technique selection (crafter, refiner) |
| Executor | Sonnet | Convergence loop, adversarial attacks, format conversion, test execution |
| Validator | Haiku | Quality gate — file completeness, metadata consistency, score freshness |
Respect the tiering. Routing a Haiku validation task to Opus burns budget and breaks the cost contract.
- Claude-to-GPT copy. Dropping a Claude-formatted prompt into a GPT folder. GPT needs sandwich method and different emphasis conventions — run
/translate-prompt. - Scratch in prompts/. Leaving HTML, diff, or iteration artifacts in the prompt folder. They belong in
state/; onlyreport.pdfships. - Unverified translation. Handing back a translated prompt without a score comparison. Translation without verification is not translation.
- Autonomous image loops. Iterating an image prompt without visual feedback. Text prompts converge on assertions; image prompts converge on developer ratings.
- DEPLOY claim with stale metadata. Verdict comes from the current convergence run's scores, not
metadata.jsonfrom a prior session. Re-run self-eval if unsure.