Skip to content

Commit dad027f

Browse files
committed
feat: apply research findings to rest-owl plugin
Key improvements based on analysis of 20+ articles on spec-driven development, agentic engineering, and visual regression testing: - Frame as "agentic engineering" (Karpathy 2026), not vibe coding - Add Phase 7 (Handoff & Ownership) to address skill gap concern (Osmani) - Add project constitution (CLAUDE.md) to Phase 4 (from GitHub Spec Kit pattern) - Switch Phase 6 to test-first: write tests from specs before implementation - Add positioning statement to Phase 0 intake (from OpenAI PRD template) - Add spec quality principles to feature-spec skill (from Thoughtworks SDD) - Use Playwright Docker container in CI for consistent visual regression (best practices) - Add dynamic content masking and flaky test prevention guidance - Update README and command to reflect new Phase 7 and constitution artifact https://claude.ai/code/session_0191H4s9PX5VxKfmKnrq3aYF
1 parent cd6cea8 commit dad027f

5 files changed

Lines changed: 104 additions & 24 deletions

File tree

plugins/rest-owl/README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,19 @@
44
55
Turn a simple idea into a fully researched, specified, designed, tested, and validated software project.
66

7+
This is **agentic engineering** — not vibe coding. Structured AI orchestration with human oversight at every phase. Better specs produce better code. Comprehensive tests enable confident delegation.
8+
79
## What It Does
810

911
You say "build a Notion clone." This plugin handles everything else:
1012

1113
1. **Competitive Research** — Analyzes 5-10 existing products, builds feature matrices, identifies patterns
12-
2. **Feature Specification** — Writes detailed specs with user stories, acceptance criteria, data models, API definitions
14+
2. **Feature Specification** — Writes detailed specs with user stories, acceptance criteria, data models, API definitions. Specs are treated as executable prompts, not documentation.
1315
3. **Visual Design** — Creates a design system, ASCII wireframes, and renderable HTML mockups for every screen
14-
4. **Technical Architecture** — Makes and documents all tech stack decisions with justifications
16+
4. **Technical Architecture** — Makes and documents all tech stack decisions, plus generates a project constitution (`CLAUDE.md`) for consistent AI code generation
1517
5. **Implementation Planning** — Breaks the build into ordered milestones with testing requirements
16-
6. **Build & Validate** — Implements with unit tests, E2E tests, and visual regression testing in CI
18+
6. **Build & Validate** — Test-first: writes tests from specs, then implements until they pass. Visual regression testing in CI.
19+
7. **Handoff & Ownership** — Architecture walkthrough, code tour, and maintenance guide so you can own the codebase
1720

1821
## Usage
1922

@@ -43,16 +46,19 @@ All design artifacts are saved to `docs/rest-owl/` in the project:
4346

4447
```
4548
docs/rest-owl/
46-
├── 00-intake.md # Project brief
49+
├── 00-intake.md # Project brief and positioning statement
4750
├── 01-competitive-research.md # Market analysis
4851
├── 02-feature-spec.md # Feature specifications
4952
├── 03-design-system.md # Design tokens
5053
├── 03-wireframes.md # ASCII wireframes
5154
├── 03-mockups/ # HTML mockup files
5255
├── 04-architecture.md # Technical architecture
53-
└── 05-implementation-plan.md # Build milestones
56+
├── 05-implementation-plan.md # Build milestones
57+
└── 06-handoff.md # Architecture guide and maintenance instructions
5458
```
5559

60+
A `CLAUDE.md` constitution file is also created in the project root during Phase 4.
61+
5662
These artifacts serve as:
5763

5864
- **Session checkpoints** — resume work across sessions

plugins/rest-owl/commands/rest-owl.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,14 @@ Check if `docs/rest-owl/` already exists — if so, this is a **resumption**. Re
2222

2323
Follow the rest-owl skill phases in strict order:
2424

25-
1. **Phase 0 — Intake**: Clarify the idea with targeted questions
25+
1. **Phase 0 — Intake**: Clarify the idea with targeted questions, draft positioning statement
2626
2. **Phase 1 — Research**: Competitive analysis (invoke `competitive-research` skill)
2727
3. **Phase 2 — Specification**: Detailed feature specs (invoke `feature-spec` skill)
2828
4. **Phase 3 — Visual Design**: Mockups and design system (invoke `visual-design` skill)
29-
5. **Phase 4 — Architecture**: Technical decisions and system design
29+
5. **Phase 4 — Architecture**: Technical decisions, system design, and project constitution (`CLAUDE.md`)
3030
6. **Phase 5 — Planning**: Implementation milestones
31-
7. **Phase 6 — Build & Validate**: Implementation with testing (invoke `validation-pipeline` skill)
31+
7. **Phase 6 — Build & Validate**: Test-first implementation with testing (invoke `validation-pipeline` skill)
32+
8. **Phase 7 — Handoff**: Architecture walkthrough, code tour, and maintenance guide
3233

3334
**Critical**: Get user approval between each phase before proceeding.
3435

plugins/rest-owl/skills/feature-spec/SKILL.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,16 +257,28 @@ Write to `docs/rest-owl/02-feature-spec.md`:
257257

258258
Feature specs within different domains are independent — use `Agent` tool to write specs for multiple domains simultaneously.
259259

260+
## Spec Quality Principles
261+
262+
Specifications are not documentation — they are **executable prompts** for Phase 6. The quality of the spec directly determines the quality of the generated code. Apply these principles:
263+
264+
- **Clarity over brevity** — an AI coding agent can't ask clarifying questions mid-generation. If a spec is ambiguous, the agent will hallucinate an answer.
265+
- **Given/When/Then determinism** — every acceptance criterion should have exactly one correct behavior. Avoid "should handle gracefully" without defining what "gracefully" means.
266+
- **Domain language** — define terms in the glossary and use them consistently. This reduces AI hallucination by anchoring generation to specific vocabulary.
267+
- **Explicit over implicit** — state every assumption. What seems "obvious" to you is invisible to the agent. The "curse of knowledge" (assuming others know what you know) is the primary source of spec gaps.
268+
- **Testable criteria** — every acceptance criterion must be verifiable by an automated test. If you can't write a test for it, the criterion is too vague.
269+
260270
## Quality Checks
261271

262272
Before completing this phase:
263273

264274
- [ ] Every P0 and P1 feature has a complete specification
265275
- [ ] Every feature has at least 2 user stories with Given/When/Then
266276
- [ ] Every feature has at least 3 acceptance criteria
277+
- [ ] Every acceptance criterion is testable (could become an automated test)
267278
- [ ] Every feature identifies edge cases and error scenarios
268279
- [ ] Data model covers all entities referenced by features
269280
- [ ] API endpoints exist for all data operations
270281
- [ ] Cross-cutting concerns are documented
271282
- [ ] No circular dependencies between features
283+
- [ ] Glossary defines all domain-specific terms
272284
- [ ] User has approved the scope before detailed specs were written

plugins/rest-owl/skills/rest-owl/SKILL.md

Lines changed: 61 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,20 @@ allowed-tools: Bash, Read, Grep, Glob, Edit, Write, Agent, WebSearch, WebFetch,
1515
1616
This skill turns a napkin-sketch idea into a production-ready software project. The user provides a simple concept; you deliver a fully researched, specified, designed, tested, and validated codebase.
1717

18+
This is **agentic engineering** — not vibe coding. You are orchestrating structured work with human oversight at every stage. The planning phases (0-5) are where the engineering expertise matters most; the build phase (6) follows naturally from thorough upfront work. Better specs produce better code. Comprehensive tests enable confident delegation. Clean architecture reduces hallucination.
19+
1820
## Philosophy
1921

2022
The user's job is to have the idea. Your job is everything else:
2123

2224
- **Research** what already exists and what users expect
23-
- **Specify** every feature in detail with acceptance criteria
25+
- **Specify** every feature in detail with acceptance criteria — specs are executable prompts, not documentation
2426
- **Design** the visual interface with concrete mockups
25-
- **Architect** the technical solution with clear decisions
27+
- **Architect** the technical solution with clear decisions and a project constitution
2628
- **Plan** the implementation in buildable milestones
27-
- **Build** the project with proper testing at every layer
29+
- **Build** test-first: write tests from specs, then implement until they pass
2830
- **Validate** with automated visual regression and E2E tests in CI
31+
- **Hand off** with clear documentation so the user can own and maintain the code
2932

3033
## Activation
3134

@@ -53,9 +56,10 @@ The rest-owl workflow has 7 phases. Each phase produces artifacts that feed the
5356
- **Tech preferences**: Any framework/language preferences or constraints?
5457
- **Key differentiator**: What should make this stand out from existing solutions?
5558
- **Timeline scope**: Full product or focused MVP subset?
56-
3. Save the intake summary to `docs/rest-owl/00-intake.md`
59+
3. Draft a **positioning statement** (2-3 sentences): what this project is, who it's for, and how it differs from existing solutions. This bridges research and specification.
60+
4. Save the intake summary to `docs/rest-owl/00-intake.md`
5761

58-
**Output**: `docs/rest-owl/00-intake.md` — project brief with user's answers
62+
**Output**: `docs/rest-owl/00-intake.md` — project brief with user's answers and positioning statement
5963

6064
### Phase 1: Competitive Research
6165

@@ -135,7 +139,19 @@ This phase produces:
135139
- Development workflow (dev server, hot reload, etc.)
136140
- Dependency list with justifications
137141

138-
**Output**: `docs/rest-owl/04-architecture.md`
142+
4. **Project constitution** — a `CLAUDE.md` (or equivalent rules file) that establishes non-negotiable project rules for all subsequent AI-generated code:
143+
- Architecture patterns and constraints
144+
- Coding standards and conventions
145+
- Security requirements
146+
- Testing requirements (coverage targets, test patterns)
147+
- Naming conventions
148+
- Import/dependency rules
149+
150+
This file is placed in the project root and ensures consistent code generation across all milestones. Inspired by GitHub's Spec Kit `/speckit.constitution` pattern.
151+
152+
**Output**:
153+
- `docs/rest-owl/04-architecture.md` — technical decisions and system design
154+
- `CLAUDE.md` (project root) — constitution / project rules for AI code generation
139155

140156
### Phase 5: Implementation Plan
141157

@@ -167,34 +183,59 @@ This phase produces:
167183

168184
**Goal**: Implement the project milestone by milestone with full test coverage.
169185

186+
**Test-first approach**: Write tests derived from Phase 2 acceptance criteria BEFORE writing implementation code. Then implement until all tests pass. This is what makes agentic engineering reliable — with a solid test suite, the AI can iterate in a loop until tests pass, giving high confidence in the result.
187+
170188
For each milestone:
171189

172190
1. **Scaffold** — Create files, install dependencies, configure tooling
173-
2. **Implement** — Build features according to the spec
174-
3. **Unit test** — Write tests for all business logic
175-
4. **Component test** — Test UI components in isolation
176-
5. **E2E test** — Write Playwright tests for user flows
177-
6. **Visual baseline** — Capture screenshot baselines for visual regression
178-
7. **CI integration** — Ensure all tests run in CI with screenshot artifacts
191+
2. **Write tests first** — Translate acceptance criteria from Phase 2 into unit tests, component tests, and E2E tests. These tests will initially fail.
192+
3. **Implement** — Build features until all tests pass
193+
4. **Visual baseline** — Capture screenshot baselines for visual regression
194+
5. **CI integration** — Ensure all tests run in CI with screenshot artifacts
179195

180196
**Invoke the `validation-pipeline` skill** to set up the testing infrastructure.
181197

182198
After each milestone:
183199

184-
- Run all tests locally
200+
- Run all tests locally — all must pass
185201
- Verify visual baselines match mockups from Phase 3
186202
- Commit with conventional commit messages
187203
- Update implementation plan with completion status
188204

189205
**Output**: The actual codebase with full test coverage
190206

207+
### Phase 7: Handoff & Ownership
208+
209+
**Goal**: Ensure the user understands and can maintain the generated codebase.
210+
211+
This phase addresses a critical reality: agentic engineering produces code quickly, but the user must own and maintain it. Without understanding the architecture and key decisions, the codebase becomes unmaintainable.
212+
213+
1. **Architecture walkthrough** — Summarize the key architectural decisions and why they were made. Reference specific files and patterns.
214+
215+
2. **Code tour** — Identify the 5-10 most important files/modules and explain what each does and how they connect.
216+
217+
3. **Extension guide** — Document how to add common things:
218+
- A new feature (which files to create, what patterns to follow)
219+
- A new API endpoint
220+
- A new UI screen
221+
- A new test
222+
223+
4. **Known limitations** — Be honest about what was deferred, simplified, or left as a TODO. List areas that will need attention as the project scales.
224+
225+
5. **Maintenance checklist** — What the user should do regularly:
226+
- Dependency updates
227+
- Visual baseline updates after intentional UI changes
228+
- Test coverage monitoring
229+
230+
**Output**: `docs/rest-owl/06-handoff.md` — architecture guide and maintenance instructions
231+
191232
## Artifact Directory Structure
192233

193234
All rest-owl artifacts live under `docs/rest-owl/` in the project root:
194235

195236
```
196237
docs/rest-owl/
197-
├── 00-intake.md # Project brief and user answers
238+
├── 00-intake.md # Project brief, user answers, positioning statement
198239
├── 01-competitive-research.md # Market analysis and feature matrix
199240
├── 02-feature-spec.md # Complete feature specifications
200241
├── 03-design-system.md # Colors, typography, components
@@ -205,9 +246,12 @@ docs/rest-owl/
205246
│ └── ...
206247
├── 03-wireframes.md # ASCII wireframes for all screens
207248
├── 04-architecture.md # Technical decisions and system design
208-
└── 05-implementation-plan.md # Milestones, tasks, and testing strategy
249+
├── 05-implementation-plan.md # Milestones, tasks, and testing strategy
250+
└── 06-handoff.md # Architecture guide and maintenance instructions
209251
```
210252

253+
Additionally, a `CLAUDE.md` constitution file is created in the project root during Phase 4.
254+
211255
## User Checkpoints
212256

213257
**Never proceed to the next phase without user approval.** After each phase:
@@ -246,3 +290,5 @@ If a session ends mid-workflow, the artifact files serve as checkpoints. On resu
246290
- **Never ship without visual regression** — screenshots in CI catch visual regressions automatically
247291
- **Never batch all testing to the end** — test each milestone as you build it
248292
- **Never make tech decisions without justification** — every choice in the architecture doc needs a "why"
293+
- **Never skip the handoff** — generating code the user can't maintain creates dangerous skill atrophy
294+
- **Never treat this as vibe coding** — every phase exists for a reason; accepting AI output without review is not agentic engineering

plugins/rest-owl/skills/validation-pipeline/SKILL.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,11 +237,25 @@ for (const screen of screens) {
237237
maxDiffPixelRatio: 0.01, // Allow 1% pixel difference
238238
threshold: 0.2, // Color difference threshold
239239
animations: "disabled",
240+
// Mask dynamic content to prevent flaky tests
241+
mask: [
242+
page.locator('[data-testid*="timestamp"]'),
243+
page.locator('[data-testid*="avatar"]'),
244+
page.locator('[data-testid*="relative-time"]'),
245+
],
240246
});
241247
});
242248
}
243249
```
244250

251+
**Important: Preventing flaky visual tests**:
252+
253+
- **Mask dynamic content** — timestamps, avatars, counters, and any content that changes between runs
254+
- **Disable animations** — always use `animations: 'disabled'`
255+
- **Wait for fonts** — use `page.waitForLoadState('networkidle')` before screenshots
256+
- **Generate baselines in CI, not locally** — OS differences (macOS vs Linux) cause font rendering mismatches. Use Playwright's Docker image (`mcr.microsoft.com/playwright`) in CI for consistent results.
257+
- **Start with Chromium only** — add Firefox/WebKit when cross-browser visual bugs actually appear
258+
245259
**Screenshot update workflow**:
246260

247261
```bash
@@ -306,11 +320,12 @@ jobs:
306320

307321
visual-regression:
308322
runs-on: ubuntu-latest
323+
container:
324+
image: mcr.microsoft.com/playwright:v1.50.0-noble
309325
steps:
310326
- uses: actions/checkout@v4
311327
- uses: oven-sh/setup-bun@v2
312328
- run: bun install
313-
- run: bunx playwright install --with-deps chromium
314329

315330
- name: Run visual regression tests
316331
run: bunx playwright test e2e/visual/

0 commit comments

Comments
 (0)