Run the same small repo task twice, once prompt-only and once with the seven workbench surfaces wired in, and emit a failure-mode report that maps each missed surface to the symptom it caused.
- A stub agent and a tiny FastAPI-style handler to validate
- The seven-surface list (instructions, state, scope, feedback, verification, review, handoff)
code/main.pythat runs both pipelines back to backfailure_modes.jsonsummarizing the prompt-only run- One-line verdict for the workbench run
python3 code/main.pyexits zero- Output shows a side-by-side log of the two runs
failure_modes.jsonlists every missed surface with the matching symptom
- Calling a real model. The stub is rule-based on purpose.
- Building any one surface in depth. That is what the next eleven lessons are for.
docs/en.md- full lessoncode/main.py- reference implementationoutputs/skill-workbench-audit.md- extracted skill