feat: apply research findings to rest-owl plugin

claude · claude · commit dad027f03f6c · 2026-03-20T16:51:24.000Z
Key improvements based on analysis of 20+ articles on spec-driven development, agentic engineering, and visual regression testing: - Frame as "agentic engineering" (Karpathy 2026), not vibe coding - Add Phase 7 (Handoff & Ownership) to address skill gap concern (Osmani) - Add project constitution (CLAUDE.md) to Phase 4 (from GitHub Spec Kit pattern) - Switch Phase 6 to test-first: write tests from specs before implementation - Add positioning statement to Phase 0 intake (from OpenAI PRD template) - Add spec quality principles to feature-spec skill (from Thoughtworks SDD) - Use Playwright Docker container in CI for consistent visual regression (best practices) - Add dynamic content masking and flaky test prevention guidance - Update README and command to reflect new Phase 7 and constitution artifact https://claude.ai/code/session_0191H4s9PX5VxKfmKnrq3aYF
diff --git a/plugins/rest-owl/README.md b/plugins/rest-owl/README.md
@@ -4,16 +4,19 @@
 
 Turn a simple idea into a fully researched, specified, designed, tested, and validated software project.
 
+This is **agentic engineering** — not vibe coding. Structured AI orchestration with human oversight at every phase. Better specs produce better code. Comprehensive tests enable confident delegation.
+
 ## What It Does
 
 You say "build a Notion clone." This plugin handles everything else:
 
 1. **Competitive Research** — Analyzes 5-10 existing products, builds feature matrices, identifies patterns
-2. **Feature Specification** — Writes detailed specs with user stories, acceptance criteria, data models, API definitions
+2. **Feature Specification** — Writes detailed specs with user stories, acceptance criteria, data models, API definitions. Specs are treated as executable prompts, not documentation.
 3. **Visual Design** — Creates a design system, ASCII wireframes, and renderable HTML mockups for every screen
-4. **Technical Architecture** — Makes and documents all tech stack decisions with justifications
+4. **Technical Architecture** — Makes and documents all tech stack decisions, plus generates a project constitution (`CLAUDE.md`) for consistent AI code generation
 5. **Implementation Planning** — Breaks the build into ordered milestones with testing requirements
-6. **Build & Validate** — Implements with unit tests, E2E tests, and visual regression testing in CI
+6. **Build & Validate** — Test-first: writes tests from specs, then implements until they pass. Visual regression testing in CI.
+7. **Handoff & Ownership** — Architecture walkthrough, code tour, and maintenance guide so you can own the codebase
 
 ## Usage
 
@@ -43,16 +46,19 @@ All design artifacts are saved to `docs/rest-owl/` in the project:
 
 ```
 docs/rest-owl/
-├── 00-intake.md                    # Project brief
+├── 00-intake.md                    # Project brief and positioning statement
 ├── 01-competitive-research.md      # Market analysis
 ├── 02-feature-spec.md              # Feature specifications
 ├── 03-design-system.md             # Design tokens
 ├── 03-wireframes.md                # ASCII wireframes
 ├── 03-mockups/                     # HTML mockup files
 ├── 04-architecture.md              # Technical architecture
-└── 05-implementation-plan.md       # Build milestones
+├── 05-implementation-plan.md       # Build milestones
+└── 06-handoff.md                   # Architecture guide and maintenance instructions
 ```
 
+A `CLAUDE.md` constitution file is also created in the project root during Phase 4.
+
 These artifacts serve as:
 
 - **Session checkpoints** — resume work across sessions
diff --git a/plugins/rest-owl/commands/rest-owl.md b/plugins/rest-owl/commands/rest-owl.md
@@ -22,13 +22,14 @@ Check if `docs/rest-owl/` already exists — if so, this is a **resumption**. Re
 
 Follow the rest-owl skill phases in strict order:
 
-1. **Phase 0 — Intake**: Clarify the idea with targeted questions
+1. **Phase 0 — Intake**: Clarify the idea with targeted questions, draft positioning statement
 2. **Phase 1 — Research**: Competitive analysis (invoke `competitive-research` skill)
 3. **Phase 2 — Specification**: Detailed feature specs (invoke `feature-spec` skill)
 4. **Phase 3 — Visual Design**: Mockups and design system (invoke `visual-design` skill)
-5. **Phase 4 — Architecture**: Technical decisions and system design
+5. **Phase 4 — Architecture**: Technical decisions, system design, and project constitution (`CLAUDE.md`)
 6. **Phase 5 — Planning**: Implementation milestones
-7. **Phase 6 — Build & Validate**: Implementation with testing (invoke `validation-pipeline` skill)
+7. **Phase 6 — Build & Validate**: Test-first implementation with testing (invoke `validation-pipeline` skill)
+8. **Phase 7 — Handoff**: Architecture walkthrough, code tour, and maintenance guide
 
 **Critical**: Get user approval between each phase before proceeding.
 
diff --git a/plugins/rest-owl/skills/feature-spec/SKILL.md b/plugins/rest-owl/skills/feature-spec/SKILL.md
@@ -257,16 +257,28 @@ Write to `docs/rest-owl/02-feature-spec.md`:
 
 Feature specs within different domains are independent — use `Agent` tool to write specs for multiple domains simultaneously.
 
+## Spec Quality Principles
+
+Specifications are not documentation — they are **executable prompts** for Phase 6. The quality of the spec directly determines the quality of the generated code. Apply these principles:
+
+- **Clarity over brevity** — an AI coding agent can't ask clarifying questions mid-generation. If a spec is ambiguous, the agent will hallucinate an answer.
+- **Given/When/Then determinism** — every acceptance criterion should have exactly one correct behavior. Avoid "should handle gracefully" without defining what "gracefully" means.
+- **Domain language** — define terms in the glossary and use them consistently. This reduces AI hallucination by anchoring generation to specific vocabulary.
+- **Explicit over implicit** — state every assumption. What seems "obvious" to you is invisible to the agent. The "curse of knowledge" (assuming others know what you know) is the primary source of spec gaps.
+- **Testable criteria** — every acceptance criterion must be verifiable by an automated test. If you can't write a test for it, the criterion is too vague.
+
 ## Quality Checks
 
 Before completing this phase:
 
 - [ ] Every P0 and P1 feature has a complete specification
 - [ ] Every feature has at least 2 user stories with Given/When/Then
 - [ ] Every feature has at least 3 acceptance criteria
+- [ ] Every acceptance criterion is testable (could become an automated test)
 - [ ] Every feature identifies edge cases and error scenarios
 - [ ] Data model covers all entities referenced by features
 - [ ] API endpoints exist for all data operations
 - [ ] Cross-cutting concerns are documented
 - [ ] No circular dependencies between features
+- [ ] Glossary defines all domain-specific terms
 - [ ] User has approved the scope before detailed specs were written
diff --git a/plugins/rest-owl/skills/rest-owl/SKILL.md b/plugins/rest-owl/skills/rest-owl/SKILL.md
@@ -15,17 +15,20 @@ allowed-tools: Bash, Read, Grep, Glob, Edit, Write, Agent, WebSearch, WebFetch,
 
 This skill turns a napkin-sketch idea into a production-ready software project. The user provides a simple concept; you deliver a fully researched, specified, designed, tested, and validated codebase.
 
+This is **agentic engineering** — not vibe coding. You are orchestrating structured work with human oversight at every stage. The planning phases (0-5) are where the engineering expertise matters most; the build phase (6) follows naturally from thorough upfront work. Better specs produce better code. Comprehensive tests enable confident delegation. Clean architecture reduces hallucination.
+
 ## Philosophy
 
 The user's job is to have the idea. Your job is everything else:
 
 - **Research** what already exists and what users expect
-- **Specify** every feature in detail with acceptance criteria
+- **Specify** every feature in detail with acceptance criteria — specs are executable prompts, not documentation
 - **Design** the visual interface with concrete mockups
-- **Architect** the technical solution with clear decisions
+- **Architect** the technical solution with clear decisions and a project constitution
 - **Plan** the implementation in buildable milestones
-- **Build** the project with proper testing at every layer
+- **Build** test-first: write tests from specs, then implement until they pass
 - **Validate** with automated visual regression and E2E tests in CI
+- **Hand off** with clear documentation so the user can own and maintain the code
 
 ## Activation
 
@@ -53,9 +56,10 @@ The rest-owl workflow has 7 phases. Each phase produces artifacts that feed the
    - **Tech preferences**: Any framework/language preferences or constraints?
    - **Key differentiator**: What should make this stand out from existing solutions?
    - **Timeline scope**: Full product or focused MVP subset?
-3. Save the intake summary to `docs/rest-owl/00-intake.md`
+3. Draft a **positioning statement** (2-3 sentences): what this project is, who it's for, and how it differs from existing solutions. This bridges research and specification.
+4. Save the intake summary to `docs/rest-owl/00-intake.md`
 
-**Output**: `docs/rest-owl/00-intake.md` — project brief with user's answers
+**Output**: `docs/rest-owl/00-intake.md` — project brief with user's answers and positioning statement
 
 ### Phase 1: Competitive Research
 
@@ -135,7 +139,19 @@ This phase produces:
    - Development workflow (dev server, hot reload, etc.)
    - Dependency list with justifications
 
-**Output**: `docs/rest-owl/04-architecture.md`
+4. **Project constitution** — a `CLAUDE.md` (or equivalent rules file) that establishes non-negotiable project rules for all subsequent AI-generated code:
+   - Architecture patterns and constraints
+   - Coding standards and conventions
+   - Security requirements
+   - Testing requirements (coverage targets, test patterns)
+   - Naming conventions
+   - Import/dependency rules
+
+   This file is placed in the project root and ensures consistent code generation across all milestones. Inspired by GitHub's Spec Kit `/speckit.constitution` pattern.
+
+**Output**:
+- `docs/rest-owl/04-architecture.md` — technical decisions and system design
+- `CLAUDE.md` (project root) — constitution / project rules for AI code generation
 
 ### Phase 5: Implementation Plan
 
@@ -167,34 +183,59 @@ This phase produces:
 
 **Goal**: Implement the project milestone by milestone with full test coverage.
 
+**Test-first approach**: Write tests derived from Phase 2 acceptance criteria BEFORE writing implementation code. Then implement until all tests pass. This is what makes agentic engineering reliable — with a solid test suite, the AI can iterate in a loop until tests pass, giving high confidence in the result.
+
 For each milestone:
 
 1. **Scaffold** — Create files, install dependencies, configure tooling
-2. **Implement** — Build features according to the spec
-3. **Unit test** — Write tests for all business logic
-4. **Component test** — Test UI components in isolation
-5. **E2E test** — Write Playwright tests for user flows
-6. **Visual baseline** — Capture screenshot baselines for visual regression
-7. **CI integration** — Ensure all tests run in CI with screenshot artifacts
+2. **Write tests first** — Translate acceptance criteria from Phase 2 into unit tests, component tests, and E2E tests. These tests will initially fail.
+3. **Implement** — Build features until all tests pass
+4. **Visual baseline** — Capture screenshot baselines for visual regression
+5. **CI integration** — Ensure all tests run in CI with screenshot artifacts
 
 **Invoke the `validation-pipeline` skill** to set up the testing infrastructure.
 
 After each milestone:
 
-- Run all tests locally
+- Run all tests locally — all must pass
 - Verify visual baselines match mockups from Phase 3
 - Commit with conventional commit messages
 - Update implementation plan with completion status
 
 **Output**: The actual codebase with full test coverage
 
+### Phase 7: Handoff & Ownership
+
+**Goal**: Ensure the user understands and can maintain the generated codebase.
+
+This phase addresses a critical reality: agentic engineering produces code quickly, but the user must own and maintain it. Without understanding the architecture and key decisions, the codebase becomes unmaintainable.
+
+1. **Architecture walkthrough** — Summarize the key architectural decisions and why they were made. Reference specific files and patterns.
+
+2. **Code tour** — Identify the 5-10 most important files/modules and explain what each does and how they connect.
+
+3. **Extension guide** — Document how to add common things:
+   - A new feature (which files to create, what patterns to follow)
+   - A new API endpoint
+   - A new UI screen
+   - A new test
+
+4. **Known limitations** — Be honest about what was deferred, simplified, or left as a TODO. List areas that will need attention as the project scales.
+
+5. **Maintenance checklist** — What the user should do regularly:
+   - Dependency updates
+   - Visual baseline updates after intentional UI changes
+   - Test coverage monitoring
+
+**Output**: `docs/rest-owl/06-handoff.md` — architecture guide and maintenance instructions
+
 ## Artifact Directory Structure
 
 All rest-owl artifacts live under `docs/rest-owl/` in the project root:
 
 ```
 docs/rest-owl/
-├── 00-intake.md                    # Project brief and user answers
+├── 00-intake.md                    # Project brief, user answers, positioning statement
 ├── 01-competitive-research.md      # Market analysis and feature matrix
 ├── 02-feature-spec.md              # Complete feature specifications
 ├── 03-design-system.md             # Colors, typography, components
@@ -205,9 +246,12 @@ docs/rest-owl/
 │   └── ...
 ├── 03-wireframes.md                # ASCII wireframes for all screens
 ├── 04-architecture.md              # Technical decisions and system design
-└── 05-implementation-plan.md       # Milestones, tasks, and testing strategy
+├── 05-implementation-plan.md       # Milestones, tasks, and testing strategy
+└── 06-handoff.md                   # Architecture guide and maintenance instructions
 ```
 
+Additionally, a `CLAUDE.md` constitution file is created in the project root during Phase 4.
+
 ## User Checkpoints
 
 **Never proceed to the next phase without user approval.** After each phase:
@@ -246,3 +290,5 @@ If a session ends mid-workflow, the artifact files serve as checkpoints. On resu
 - **Never ship without visual regression** — screenshots in CI catch visual regressions automatically
 - **Never batch all testing to the end** — test each milestone as you build it
 - **Never make tech decisions without justification** — every choice in the architecture doc needs a "why"
+- **Never skip the handoff** — generating code the user can't maintain creates dangerous skill atrophy
+- **Never treat this as vibe coding** — every phase exists for a reason; accepting AI output without review is not agentic engineering
diff --git a/plugins/rest-owl/skills/validation-pipeline/SKILL.md b/plugins/rest-owl/skills/validation-pipeline/SKILL.md
@@ -237,11 +237,25 @@ for (const screen of screens) {
       maxDiffPixelRatio: 0.01, // Allow 1% pixel difference
       threshold: 0.2, // Color difference threshold
       animations: "disabled",
+      // Mask dynamic content to prevent flaky tests
+      mask: [
+        page.locator('[data-testid*="timestamp"]'),
+        page.locator('[data-testid*="avatar"]'),
+        page.locator('[data-testid*="relative-time"]'),
+      ],
     });
   });
 }
 ```
 
+**Important: Preventing flaky visual tests**:
+
+- **Mask dynamic content** — timestamps, avatars, counters, and any content that changes between runs
+- **Disable animations** — always use `animations: 'disabled'`
+- **Wait for fonts** — use `page.waitForLoadState('networkidle')` before screenshots
+- **Generate baselines in CI, not locally** — OS differences (macOS vs Linux) cause font rendering mismatches. Use Playwright's Docker image (`mcr.microsoft.com/playwright`) in CI for consistent results.
+- **Start with Chromium only** — add Firefox/WebKit when cross-browser visual bugs actually appear
+
 **Screenshot update workflow**:
 
 ```bash
@@ -306,11 +320,12 @@ jobs:
 
   visual-regression:
     runs-on: ubuntu-latest
+    container:
+      image: mcr.microsoft.com/playwright:v1.50.0-noble
     steps:
       - uses: actions/checkout@v4
       - uses: oven-sh/setup-bun@v2
       - run: bun install
-      - run: bunx playwright install --with-deps chromium
 
       - name: Run visual regression tests
         run: bunx playwright test e2e/visual/