Build a reviewer loop that reads the builder's artifacts read-only and emits a review_report.json scored across five dimensions, totalling out of 10, with a verdict of pass, soft_fail, or hard_fail.
ReviewerInputsbundling diff, state, feedback, and verification verdict from prior lessons- Rubric dimensions: problem fit, scope discipline, assumptions, verification quality, handoff readiness
- One scoring function per dimension (stub-grade for the lesson, deterministic)
review_report.jsonwriter with five scores, total, and verdict- Two demo cases: a clean change and a "right tests, wrong problem" change
python3 code/main.pyexits zero- The clean change scores at least 7 with verdict
pass - The wrong-problem change drops below 5 on at least one dimension and verdict flips to
hard_fail
- Real LLM calls. The lesson stubs each dimension; the skill swaps in a model later.
- Editing the diff. The reviewer reads, scores, and reports. Patches are the builder's job next turn.
docs/en.md- full lessoncode/main.py- reference implementationoutputs/skill-reviewer-agent.md- extracted skill