Based on: Janet Gregory's "Three Amigos" collaborative testing approach Purpose: Facilitate productive specification conversations for new Ralph features Audience: Developers, Testers, Product Owners working on Ralph enhancements
A specification workshop brings together three perspectives ("Three Amigos") to define features before implementation:
- Developer (How to implement) - Technical feasibility and approach
- Tester (How to verify) - Edge cases, validation, quality criteria
- Product Owner / User (What's the value) - Business requirements and success criteria
Goal: Produce concrete, testable specifications that prevent bugs and misunderstandings.
Participants:
- Developer: [Name]
- Tester: [Name]
- Product Owner: [Name] Date: YYYY-MM-DD Duration: 30-60 minutes
As a [role] I want [capability] So that [benefit]
Example:
As a Ralph user I want circuit breaker auto-recovery So that temporary issues don't require manual intervention
What makes this feature "done" and valuable?
Criteria:
- [Measurable criterion 1]
- [Measurable criterion 2]
- [Measurable criterion 3]
Example:
- Circuit breaker auto-recovers when progress resumes
- User is notified of recovery via log message
- Recovery happens within 1 loop iteration
What needs clarification? What could go wrong?
Tester Questions:
- What happens if [edge case 1]?
- How do we verify [behavior 2]?
- What's the expected behavior when [scenario 3]?
Answers:
- [Answer to question 1]
- [Answer to question 2]
- [Answer to question 3]
Example: Q: What happens if circuit opens and closes rapidly (flapping)? A: Circuit requires 2 stable loops in CLOSED before considering fully recovered
Q: How do we test auto-recovery? A: Integration test: force HALF_OPEN state, simulate progress, verify CLOSED
How will this be built? What are the technical constraints?
Approach:
- [High-level implementation strategy]
- [Key components to modify]
- [Dependencies or prerequisites]
Constraints:
- [Technical limitation 1]
- [Technical limitation 2]
Example: Approach:
- Modify
record_loop_result()to track recovery attempts - Add
recovery_countfield to circuit breaker state - Implement recovery validation logic in state transitions
Constraints:
- Must maintain backward compatibility with existing state files
- Recovery logic must not slow down normal loop execution
Concrete scenarios using Given/When/Then format.
Given:
- [Initial condition 1]
- [Initial condition 2]
When: [Action or trigger]
Then:
- [Expected outcome 1]
- [Expected outcome 2]
And:
- [Additional verification]
Example:
Given:
- Circuit breaker is in HALF_OPEN state
- consecutive_no_progress is 2
- last_progress_loop was loop #10
When: Loop #13 completes with 3 files changed
Then:
- Circuit breaker transitions to CLOSED state
- consecutive_no_progress resets to 0
- last_progress_loop updates to 13
- Log message: "✅ CIRCUIT BREAKER: Normal Operation - Progress detected, circuit recovered"
And:
- Circuit breaker history records the HALF_OPEN → CLOSED transition
- .circuit_breaker_state file contains state: "CLOSED"
[Repeat format above for 3-5 key scenarios]
What unusual situations must be handled?
Edge Cases:
- [Edge case 1] → [Expected behavior]
- [Edge case 2] → [Expected behavior]
- [Edge case 3] → [Expected behavior]
Error Conditions:
- [Error condition 1] → [Error handling strategy]
- [Error condition 2] → [Error handling strategy]
Example:
Edge Cases:
- Circuit opens and closes in same second → Track transitions, no timestamp collision
- Recovery during rate limit wait → Allow recovery, don't block on rate limit
- File changes detected but tests fail → Don't consider full recovery, stay in HALF_OPEN
Error Conditions:
- Circuit state file corrupted → Reinitialize to CLOSED, log warning
- jq command not available → Fallback to manual parsing or disable circuit breaker
How will we verify this works?
Unit Tests:
- [Unit test 1]
- [Unit test 2]
Integration Tests:
- [Integration test 1]
- [Integration test 2]
Manual Tests:
- [Manual verification 1]
Example:
Unit Tests:
- Test state transition logic: HALF_OPEN + progress → CLOSED
- Test state persistence across function calls
Integration Tests:
- Full loop cycle: trigger HALF_OPEN, simulate recovery, verify CLOSED
- Verify log messages appear with correct formatting
- Test recovery with real file changes via git
Manual Tests:
- Run ralph-monitor during recovery and observe state changes
- Verify .circuit_breaker_history contains transition records
Performance, security, usability considerations.
Performance:
- [Requirement 1]
- [Requirement 2]
Security:
- [Requirement 1]
Usability:
- [Requirement 1]
Example:
Performance:
- Recovery detection must complete in < 100ms
- No memory leaks from repeated state transitions
Security:
- State files must not expose sensitive project information
- Circuit breaker must not bypass API rate limits
Usability:
- Recovery messages must be clear and actionable
- User should understand why recovery occurred
When can we consider this feature complete?
Checklist:
- Code implemented and reviewed
- All unit tests passing
- All integration tests passing
- Edge cases handled and tested
- Documentation updated
- Examples added
- Manually tested in realistic scenario
- Merged to main branch
What needs to happen next?
Action Items:
- [Person] - [Action] - [Deadline]
- [Person] - [Action] - [Deadline]
Example:
- Developer - Implement recovery logic - 2025-10-02
- Tester - Write integration tests - 2025-10-02
- Product Owner - Review and approve scenarios - 2025-10-03
Feature: Automatic retry on API rate limit errors
As a Ralph user I want automatic retries on temporary API errors So that transient issues don't stop my development workflow
- Ralph detects "rate_limit_error" in Claude output
- Ralph waits appropriate time before retry (5 minutes)
- Ralph limits retries to 3 attempts
- Ralph falls back to user prompt on persistent failure
- Retry attempts are logged clearly
Q: What counts as a "rate limit error" vs other errors? A: Specific string "rate_limit_error" or "429" status code in output
Q: Should retries count against hourly call limit? A: Yes, retry attempts consume call quota
Q: What if user Ctrl+C during wait period? A: Graceful shutdown, save state, allow resume
Approach:
- Add retry logic to
execute_claude_code()function - Implement exponential backoff (5 min → 10 min → 15 min)
- Store retry state in
.retry_statefile - Add retry counter to status.json
Constraints:
- Must work with existing rate limit tracking
- Cannot bypass circuit breaker
- Retries must respect API 5-hour limit
Scenario 1: Successful Retry
Given:
- Ralph executes Claude Code at loop #5
- Claude returns "rate_limit_error: please retry"
- Retry count is 0
When: Ralph detects the rate limit error
Then:
- Ralph logs "Rate limit detected, attempt 1/3. Waiting 5 minutes..."
- Ralph sleeps for 300 seconds
- Ralph retries Claude Code execution
- If successful: continues normally, resets retry count to 0
Scenario 2: Persistent Failure
Given:
- Ralph has retried 3 times already
- Each retry resulted in "rate_limit_error"
When: 4th execution also returns rate limit error
Then:
- Ralph logs "Retry limit exceeded (3 attempts)"
- Ralph prompts user: "Continue waiting? (y/n)"
- User decision determines next action (exit or continue)
- Rate limit error during first loop → Retry works immediately
- User interrupts during wait → Clean shutdown, state preserved
- Different error after retry → Handle as normal error, don't increment retry count
- Rate limit resolves after 1st retry → Reset counter, continue normally
Unit Tests:
- Test retry detection logic
- Test exponential backoff calculation
- Test retry limit enforcement
Integration Tests:
- Mock rate limit error, verify retry happens
- Mock 3 failures, verify fallback to user prompt
- Verify retry state persists across restarts
- Code implemented in ralph_loop.sh
- Unit tests added to tests/unit/
- Integration tests added to tests/integration/
- Documentation updated in README.md
- Manually tested with mock API errors
- Merged to main
- Prepare: Send user story to participants 24 hours ahead
- Context: Provide relevant background (why this feature now?)
- Time-box: Schedule 30-60 minutes max
- Focus: One feature at a time
- Concrete: Use real examples, not abstract descriptions
- Questions: Encourage tester to ask "what could go wrong?"
- Document: Capture decisions in real-time
- Summarize: Send notes to all participants
- Track: Create tasks for action items
- Reference: Use scenarios for test cases
❌ "We'll figure it out during implementation" ❌ "That's edge case, we'll handle it later" ❌ Vague acceptance criteria ❌ No concrete examples ❌ Skipping tester perspective
✅ Clear, testable scenarios ✅ Edge cases identified before coding ✅ All three perspectives represented ✅ Concrete examples, not abstractions ✅ Shared understanding among participants
# Feature: [Name]
**User Story**: As [role], I want [capability] so that [benefit]
**Key Scenarios**:
1. Given [state], When [action], Then [outcome]
2. Given [state], When [action], Then [outcome]
**Edge Cases**:
- [Case 1] → [Behavior]
- [Case 2] → [Behavior]
**Tests**:
- [ ] [Test 1]
- [ ] [Test 2]
**Done When**:
- [ ] Implemented
- [ ] Tested
- [ ] Documented- Three Amigos: https://www.agilealliance.org/glossary/three-amigos/
- Specification by Example - Gojko Adzic
- Agile Testing - Lisa Crispin, Janet Gregory
Last Updated: 2025-10-01 Status: Phase 2 Complete Next: Use this template for all new Ralph features