Version: 2.1 Last Updated: 2026-01-26 Category: Feature Guide
Quality Convergence = "Keep trying until it's good enough"
Instead of:
AI writes code → Tests fail → You manually fix → Tests fail again → Repeat 10x
Babysitter does:
AI writes code → Tests: 60% pass → AI fixes → Tests: 85% pass → AI fixes → Tests: 95% pass ✓ Done!
| Section | What It Covers | Read If You Want To... |
|---|---|---|
| Five Quality Gates | Types of checks (tests, lint, security, etc.) | Understand what gets checked |
| 90-Score Pattern | How to reliably hit high quality | Build production-ready workflows |
| Process Examples | Real code from the library | See working implementations |
| Step-by-Step | How to build your own | Create custom quality loops |
Here's what quality convergence looks like in practice:
Iteration 1:
- AI writes login feature
- Tests run: 3/10 passing (30%)
- AI sees: "Missing password validation, no error handling"
Iteration 2:
- AI fixes based on feedback
- Tests run: 7/10 passing (70%)
- AI sees: "Edge case for empty email not handled"
Iteration 3:
- AI fixes edge cases
- Tests run: 10/10 passing (100%)
- Quality target met! ✓
Output: Working login feature with all tests passing
Key insight: The AI doesn't just try once - it learns from each failure and improves.
Quality scores are multi-dimensional, not a single number. This is what makes Babysitter's quality convergence so accurate - instead of a simple pass/fail, you get nuanced feedback across multiple dimensions that guide improvement.
A typical quality score includes:
| Dimension | What It Measures | Example |
|---|---|---|
| Tests | Pass rate and coverage | 92% tests passing, 85% coverage |
| Code Quality | Lint errors, complexity | 0 lint errors, complexity < 10 |
| Security | Vulnerabilities, secrets | 0 critical issues |
| Performance | Response time, bundle size | p95 < 500ms |
| Type Safety | Type errors, null safety | 0 type errors |
You define what quality means for your project. The dimensions above are just examples - you can:
- Define your own 5 dimensions that matter most for your domain
- Ask Babysitter to suggest dimensions appropriate for your specific task
- Weight dimensions differently based on project phase or criticality
For example, a data pipeline might use completely different dimensions:
| Dimension | Weight | Threshold |
|---|---|---|
| Data Accuracy | 30% | > 99.9% |
| Processing Speed | 25% | < 5 min/GB |
| Schema Validation | 20% | 100% valid |
| Idempotency | 15% | All operations idempotent |
| Error Recovery | 10% | Auto-recovery < 30s |
This flexibility means quality convergence adapts to any domain - from ML model training to infrastructure deployment to documentation generation.
For detailed scoring formulas and weight configurations, see Best Practices - Custom Scoring Strategies.
Quality convergence is an iterative improvement pattern where Babysitter repeatedly refines work until a defined quality target is achieved. Instead of executing a task once and hoping for the best, quality convergence loops through implementation, testing, and scoring cycles until the output meets your standards.
From the Two-Loops Control Plane architecture, the fundamental principle is:
If you don't have evidence, you don't have completion.
If you do only one thing: make completion require evidence. — This single principle transforms "it seems done" into "it is done."
Every phase must end with:
- Artifact: The work product (patch, doc, config, report)
- Evidence: Proof that it meets requirements (logs, test output, checks)
- Consistent Quality: Guarantee outputs meet minimum quality thresholds
- Automated Refinement: Let the system iterate without manual intervention
- Measurable Results: Track quality scores across iterations
- Predictable Outcomes: Set clear targets and iteration limits
- TDD Integration: Combine with test-driven development for robust code
- Evidence-Based Completion: Every iteration produces verifiable proof of quality
Quality gates are not a single check. They form a layered validation system that ensures completeness from multiple perspectives. For robust quality convergence, use 4-5 gate types simultaneously.
Verifies the code behaves correctly across all levels.
// From: methodologies/v-model.js (V-Model process)
const testResults = await ctx.task(executeTestsTask, {
implementation,
unitTestDesigns, // Validates module design
integrationTestDesign, // Validates architecture
systemTestDesign, // Validates system design
acceptanceTestDesign // Validates requirements
});
const allTestsPassed =
testResults.unitTests.passed &&
testResults.integrationTests.passed &&
testResults.systemTests.passed &&
testResults.acceptanceTests.passed;Gate Criteria:
| Test Level | What It Validates | Typical Pass Threshold |
|---|---|---|
| Unit Tests | Individual functions/classes | 90-100% pass rate |
| Integration Tests | Module interactions | 95-100% pass rate |
| System Tests | End-to-end behavior | 90-100% pass rate |
| Acceptance Tests | User requirements | 100% for critical |
Ensures code follows style guidelines and maintainability standards.
// Parallel code quality checks
const [lint, format, complexity] = await ctx.parallel.all([
() => ctx.task(lintTask, { files: impl.filesModified }),
() => ctx.task(formatCheckTask, { files: impl.filesModified }),
() => ctx.task(complexityTask, { files: impl.filesModified })
]);
const codeQualityGatePassed =
lint.errorCount === 0 &&
format.violations === 0 &&
complexity.maxCyclomaticComplexity < 10;Gate Criteria:
| Check | Tool Examples | Typical Threshold |
|---|---|---|
| Lint Errors | ESLint, Pylint | 0 errors |
| Formatting | Prettier, Black | 0 violations |
| Cyclomatic Complexity | SonarQube, Radon | < 10 per function |
| Code Duplication | jscpd, CPD | < 3% duplication |
Catches bugs at compile/analysis time without running the code.
// From: gsd/iterative-convergence enhanced pattern
const [typeCheck, staticAnalysis] = await ctx.parallel.all([
() => ctx.task(typeCheckTask, { files: impl.filesModified }),
() => ctx.task(staticAnalysisTask, { files: impl.filesModified })
]);
const staticGatePassed =
typeCheck.errors.length === 0 &&
staticAnalysis.criticalIssues === 0 &&
staticAnalysis.highIssues === 0;Gate Criteria:
| Check | What It Catches | Typical Threshold |
|---|---|---|
| Type Checking | Type mismatches, null errors | 0 type errors |
| Static Analysis | Potential bugs, code smells | 0 critical/high issues |
| Dead Code | Unreachable statements | 0 dead code blocks |
| Null Safety | Potential null dereferences | 0 null warnings |
Identifies vulnerabilities, secrets, and security anti-patterns.
// Security gate from methodologies/spec-driven-development.js
const security = await ctx.task(securityTask, {
files: impl.filesModified,
scanLevel: inputs.safetyLevel // 'standard' | 'high' | 'critical'
});
const securityGatePassed =
security.criticalVulnerabilities === 0 &&
security.highVulnerabilities === 0 &&
security.secretsDetected === 0 &&
security.dependencyVulnerabilities.critical === 0;Gate Criteria:
| Check | What It Scans | Typical Threshold |
|---|---|---|
| SAST (Static) | SQL injection, XSS, etc. | 0 critical/high |
| Secrets Detection | API keys, passwords | 0 secrets |
| Dependency Scan | Known CVEs in packages | 0 critical CVEs |
| OWASP Top 10 | Common web vulnerabilities | 0 violations |
Ensures the implementation meets non-functional requirements.
// Performance gate for production readiness
const performance = await ctx.task(performanceCheckTask, {
implementation: impl,
thresholds: {
loadTimeMs: 1500, // First Contentful Paint
bundleSizeKb: 200, // Gzipped bundle
apiResponseP95Ms: 500, // 95th percentile
memoryUsageMb: 512 // Peak memory
}
});
const performanceGatePassed =
performance.fcp <= 1500 &&
performance.bundleSize <= 200 &&
performance.apiP95 <= 500 &&
performance.peakMemory <= 512;Gate Criteria:
| Metric | Typical Target | Domain |
|---|---|---|
| FCP (First Contentful Paint) | < 1.5s | Frontend |
| Bundle Size | < 200KB gzipped | Frontend |
| API p95 Response | < 500ms | Backend |
| Memory Usage | < 512MB | Server |
| CPU Utilization | < 70% average | Server |
To reliably achieve scores of 90+, implement a multi-gate weighted scoring system with iterative feedback.
// Recommended weights for high-quality convergence
const QUALITY_WEIGHTS = {
// For production features
production: {
tests: 0.25, // Test coverage and pass rate
implementation: 0.25, // Code correctness
codeQuality: 0.15, // Lint, complexity, formatting
security: 0.20, // Vulnerability scanning
performance: 0.15 // Non-functional requirements
},
// For security-critical systems
securityCritical: {
tests: 0.20,
implementation: 0.20,
codeQuality: 0.10,
security: 0.35, // Higher weight for security
performance: 0.15
},
// For performance-critical systems
performanceCritical: {
tests: 0.20,
implementation: 0.20,
codeQuality: 0.10,
security: 0.15,
performance: 0.35 // Higher weight for performance
}
};/**
* Multi-gate quality convergence targeting 90+ scores
* References: gsd/iterative-convergence.js, methodologies/spec-driven-development.js
*/
export async function process(inputs, ctx) {
const {
feature,
targetQuality = 90, // Target score
maxIterations = 10, // Allow more iterations for high targets
minImprovement = 2, // Minimum improvement per iteration
plateauThreshold = 3, // Iterations without improvement
weights = QUALITY_WEIGHTS.production
} = inputs;
let iteration = 0;
let quality = 0;
const iterationHistory = [];
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
ctx.log(`[Iteration ${iteration}/${maxIterations}] Target: ${targetQuality}`);
// ===== ACT: Implement with feedback from previous iteration =====
const previousFeedback = iteration > 1
? iterationHistory[iteration - 2].recommendations
: null;
const impl = await ctx.task(implementTask, {
feature,
iteration,
previousFeedback,
focusAreas: previousFeedback?.slice(0, 3) // Top 3 priorities
});
// ===== VALIDATE: Run all five quality gates in parallel =====
const [tests, codeQuality, staticAnalysis, security, performance] =
await ctx.parallel.all([
() => ctx.task(testGateTask, { impl }),
() => ctx.task(codeQualityGateTask, { impl }),
() => ctx.task(staticAnalysisGateTask, { impl }),
() => ctx.task(securityGateTask, { impl }),
() => ctx.task(performanceGateTask, { impl })
]);
// ===== SCORE: Calculate weighted quality score =====
const scores = {
tests: tests.score,
implementation: calculateImplementationScore(impl, tests),
codeQuality: codeQuality.score,
security: security.score,
performance: performance.score
};
quality = Object.entries(weights).reduce(
(total, [dimension, weight]) => total + (scores[dimension] * weight),
0
);
// ===== ANALYZE: Generate prioritized recommendations =====
const recommendations = generateRecommendations(scores, weights, targetQuality);
iterationHistory.push({
iteration,
quality,
scores,
recommendations,
gates: { tests, codeQuality, staticAnalysis, security, performance }
});
ctx.log(`Quality: ${quality.toFixed(1)}/${targetQuality} | ` +
`Tests: ${scores.tests} | Code: ${scores.codeQuality} | ` +
`Security: ${scores.security} | Perf: ${scores.performance}`);
// ===== EARLY EXIT: Detect plateau =====
if (iteration >= plateauThreshold) {
const recent = iterationHistory.slice(-plateauThreshold).map(r => r.quality);
const improvement = Math.max(...recent) - Math.min(...recent);
if (improvement < minImprovement) {
ctx.log(`Quality plateaued at ${quality.toFixed(1)}, stopping early`);
break;
}
}
// ===== BREAKPOINT: At key thresholds =====
const converged = quality >= targetQuality;
if (!converged && quality >= 80 && iteration > 1) {
await ctx.breakpoint({
question: `Quality at ${quality.toFixed(1)}. Continue toward ${targetQuality}?`,
title: `Iteration ${iteration} Checkpoint`,
context: {
runId: ctx.runId,
files: [{ path: `artifacts/iteration-${iteration}-report.md`, format: 'markdown' }]
}
});
}
}
// ===== FINAL VALIDATION =====
const converged = quality >= targetQuality;
return {
success: converged,
quality,
targetQuality,
iterations: iteration,
iterationHistory,
finalGates: iterationHistory[iterationHistory.length - 1].gates,
metadata: { processId: 'quality-convergence-90', timestamp: ctx.now() }
};
}
function generateRecommendations(scores, weights, target) {
// Calculate gap for each dimension
const gaps = Object.entries(scores).map(([dim, score]) => ({
dimension: dim,
score,
weight: weights[dim],
weightedGap: (100 - score) * weights[dim],
priority: (100 - score) * weights[dim] // Higher weighted gap = higher priority
}));
// Sort by priority (highest impact improvements first)
return gaps
.sort((a, b) => b.priority - a.priority)
.map(g => `Improve ${g.dimension}: currently ${g.score}, ` +
`contributes ${(g.weight * g.score).toFixed(1)} of ${(g.weight * 100).toFixed(1)} possible`);
}For challenging targets (90+), use progressive escalation:
// Progressive targets that increase as iterations proceed
const progressiveTargets = [
{ iteration: 1, target: 70 }, // First: basic functionality
{ iteration: 3, target: 80 }, // Mid: solid implementation
{ iteration: 5, target: 85 }, // Late: polish and edge cases
{ iteration: 7, target: 90 } // Final: production ready
];
function getCurrentTarget(iteration, finalTarget) {
const applicable = progressiveTargets.filter(t => t.iteration <= iteration);
const progressiveTarget = applicable[applicable.length - 1]?.target || 70;
return Math.min(progressiveTarget, finalTarget);
}The V-Model process (methodologies/v-model.js) implements comprehensive quality gates:
/babysitter:call use the V-Model methodology to build a user authentication system with high safety level
Or with more detail:
/babysitter:call implement user authentication using V-Model with traceability and thorough testing
Quality Gates in V-Model:
- Requirements → Acceptance Tests (validates user needs)
- System Design → System Tests (validates architecture)
- Module Design → Integration Tests (validates interfaces)
- Implementation → Unit Tests (validates code)
- Traceability Matrix (validates coverage)
The Spec-Kit process (methodologies/spec-driven-development.js) adds governance gates:
/babysitter:call use spec-driven development to build PCI-compliant payment processing
Or:
/babysitter:call build a payment flow using the spec-driven methodology with governance validation
Quality Gates in Spec-Kit:
- Constitution Validation (governance principles)
- Specification Review (requirements completeness)
- Plan-Constitution Alignment (architecture compliance)
- Task Consistency Analysis (cross-artifact validation)
- Implementation Checklists ("unit tests for English")
- User Story Validation (final acceptance)
The GSD process (gsd/iterative-convergence.js) implements feedback-driven convergence:
/babysitter:call build a shopping cart checkout flow with 90% quality target
Or:
/babysitter:call implement checkout flow using iterative convergence with max 8 iterations
Quality Gates in GSD:
- Implementation scoring
- Test execution
- Quality assessment with recommendations
- Iterative feedback loop
Build a feature with test-driven development, iterating until test coverage and quality targets are met.
export async function process(inputs, ctx) {
const { feature, targetQuality = 85, maxIterations = 5 } = inputs;
let iteration = 0;
let quality = 0;
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
ctx.log(`[Iteration ${iteration}/${maxIterations}] Starting TDD implementation...`);
// Write tests first
const tests = await ctx.task(writeTestsTask, { feature, iteration });
// Implement code to pass tests
const impl = await ctx.task(implementTask, { tests, feature });
// Run quality checks
const [coverage, lint, security] = await ctx.parallel.all([
() => ctx.task(coverageTask, {}),
() => ctx.task(lintTask, {}),
() => ctx.task(securityTask, {})
]);
// Agent scores quality
const score = await ctx.task(agentScoringTask, {
tests, impl, coverage, lint, security
});
quality = score.overall;
ctx.log(`Quality score: ${quality}/${targetQuality}`);
}
return { converged: quality >= targetQuality, iterations: iteration, quality };
}Iteratively improve existing code until it meets quality standards.
export async function process(inputs, ctx) {
const { files, targetScore = 90, maxIterations = 10 } = inputs;
let iteration = 0;
let currentScore = 0;
// Initial assessment
currentScore = await ctx.task(assessQualityTask, { files });
ctx.log(`Initial quality score: ${currentScore}`);
while (iteration < maxIterations && currentScore < targetScore) {
iteration++;
// Identify improvements
const improvements = await ctx.task(identifyImprovementsTask, {
files,
currentScore,
targetScore
});
// Apply improvements
await ctx.task(applyImprovementsTask, { improvements });
// Re-assess
currentScore = await ctx.task(assessQualityTask, { files });
ctx.log(`Iteration ${iteration}: Quality score ${currentScore}/${targetScore}`);
}
return { achieved: currentScore >= targetScore, finalScore: currentScore };
}Generate documentation and refine until it meets completeness standards.
export async function process(inputs, ctx) {
const { codebase, targetCompleteness = 80, maxIterations = 3 } = inputs;
let iteration = 0;
let completeness = 0;
while (iteration < maxIterations && completeness < targetCompleteness) {
iteration++;
// Generate or improve documentation
await ctx.task(generateDocsTask, { codebase, iteration });
// Assess completeness
const assessment = await ctx.task(assessDocsCompletenessTask, { codebase });
completeness = assessment.completenessScore;
ctx.log(`Documentation completeness: ${completeness}%`);
}
return { complete: completeness >= targetCompleteness, completeness };
}Determine what quality means for your use case.
Common quality metrics:
- Test coverage percentage (e.g., 85%)
- Lint error count (e.g., 0 errors)
- Security vulnerability count (e.g., 0 critical)
- Overall quality score (e.g., 90/100)
Prevent infinite loops by setting a maximum number of iterations.
const { targetQuality = 85, maxIterations = 5 } = inputs;Recommendations:
- Simple improvements: 3-5 iterations
- Complex refactoring: 5-10 iterations
- Large features: 10-15 iterations
Create a loop that continues until the target is met or iterations are exhausted.
let iteration = 0;
let quality = 0;
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
// Perform work
// ...
// Measure quality
quality = await measureQuality();
ctx.log(`Iteration ${iteration}: ${quality}/${targetQuality}`);
}Create a task that evaluates quality based on your criteria.
export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
kind: 'agent',
title: 'Score implementation quality',
agent: {
name: 'quality-assessor',
prompt: {
role: 'senior quality assurance engineer',
task: 'Analyze implementation quality and provide a score from 0-100',
context: {
tests: args.tests,
implementation: args.implementation,
coverage: args.coverage,
lint: args.lint,
security: args.security
},
instructions: [
'Review test quality (weight: 25%)',
'Review implementation quality (weight: 30%)',
'Review code metrics (weight: 20%)',
'Review security (weight: 15%)',
'Review alignment with requirements (weight: 10%)',
'Provide recommendations for improvement'
]
}
}
}));Pass quality feedback to the next iteration to guide improvements.
const iterationResults = [];
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
const previousFeedback = iteration > 1
? iterationResults[iteration - 2].recommendations
: null;
const impl = await ctx.task(implementTask, {
feature,
previousFeedback // Guide improvements based on previous scoring
});
const score = await ctx.task(agentScoringTask, { impl });
iterationResults.push({
iteration,
quality: score.overall,
recommendations: score.recommendations
});
quality = score.overall;
}| Parameter | Type | Default | Description |
|---|---|---|---|
targetQuality |
number | 85 | Target quality score (0-100) |
maxIterations |
number | 5 | Maximum number of iterations before stopping |
Customize how different aspects contribute to the overall score.
const scoringWeights = {
tests: 0.25, // 25% weight for test quality
implementation: 0.30, // 30% weight for implementation quality
codeQuality: 0.20, // 20% weight for code metrics
security: 0.15, // 15% weight for security
alignment: 0.10 // 10% weight for requirements alignment
};Configure conditions that stop iteration early.
// Stop if quality plateaus (no improvement in last N iterations)
if (qualityHistory.length >= 3) {
const lastThree = qualityHistory.slice(-3);
const improvement = lastThree[2] - lastThree[0];
if (improvement < 1) {
ctx.log('Quality plateaued, stopping early');
break;
}
}Complete process definition demonstrating all quality convergence patterns.
export async function process(inputs, ctx) {
const {
feature = 'User authentication',
targetQuality = 85,
maxIterations = 5
} = inputs;
// Phase 1: Planning
const plan = await ctx.task(agentPlanningTask, { feature });
await ctx.breakpoint({
question: `Review the plan for "${feature}". Approve to proceed?`,
title: 'Plan Review',
context: { runId: ctx.runId, files: [{ path: 'artifacts/plan.md', format: 'markdown' }] }
});
// Phase 2: Quality Convergence Loop
let iteration = 0;
let quality = 0;
const iterationResults = [];
while (iteration < maxIterations && quality < targetQuality) {
iteration++;
ctx.log(`[Iteration ${iteration}/${maxIterations}]`);
// TDD: Write tests first
const tests = await ctx.task(writeTestsTask, {
feature,
plan,
iteration,
previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
});
// Run tests (expect failures on first iteration)
await ctx.task(runTestsTask, { testFiles: tests.testFiles, expectFailures: iteration === 1 });
// Implement to pass tests
const impl = await ctx.task(implementTask, {
feature,
tests,
iteration,
previousFeedback: iteration > 1 ? iterationResults[iteration - 2].feedback : null
});
// Run tests again
const testResults = await ctx.task(runTestsTask, { testFiles: tests.testFiles });
// Parallel quality checks
const [coverage, lint, typeCheck, security] = await ctx.parallel.all([
() => ctx.task(coverageTask, {}),
() => ctx.task(lintTask, { files: impl.filesModified }),
() => ctx.task(typeCheckTask, { files: impl.filesModified }),
() => ctx.task(securityTask, { files: impl.filesModified })
]);
// Agent quality scoring
const score = await ctx.task(agentQualityScoringTask, {
tests,
testResults,
implementation: impl,
qualityChecks: { coverage, lint, typeCheck, security },
iteration,
targetQuality
});
quality = score.overallScore;
iterationResults.push({
iteration,
quality,
feedback: score.recommendations
});
ctx.log(`Quality: ${quality}/${targetQuality}`);
if (quality >= targetQuality) {
ctx.log('Target quality achieved!');
}
}
// Final approval
await ctx.breakpoint({
question: `Quality: ${quality}/${targetQuality}. Approve for merge?`,
title: 'Final Review',
context: { runId: ctx.runId, files: [{ path: 'artifacts/final-report.md', format: 'markdown' }] }
});
return {
success: quality >= targetQuality,
iterations: iteration,
finalQuality: quality,
iterationResults
};
}export const agentQualityScoringTask = defineTask('quality-scorer', (args, taskCtx) => ({
kind: 'agent',
title: `Score quality (iteration ${args.iteration})`,
description: 'Comprehensive quality assessment with agent',
agent: {
name: 'quality-assessor',
prompt: {
role: 'senior quality assurance engineer and code reviewer',
task: 'Analyze implementation quality across multiple dimensions',
context: {
feature: args.feature,
tests: args.tests,
testResults: args.testResults,
implementation: args.implementation,
qualityChecks: args.qualityChecks,
iteration: args.iteration,
targetQuality: args.targetQuality
},
instructions: [
'Review test quality: coverage, edge cases, assertions (weight: 25%)',
'Review implementation quality: correctness, readability (weight: 30%)',
'Review code metrics: lint, types, complexity (weight: 20%)',
'Review security: vulnerabilities, input validation (weight: 15%)',
'Review requirements alignment (weight: 10%)',
'Calculate weighted overall score (0-100)',
'Provide prioritized recommendations for improvement'
],
outputFormat: 'JSON with overallScore, scores by dimension, recommendations'
},
outputSchema: {
type: 'object',
required: ['overallScore', 'scores', 'recommendations'],
properties: {
overallScore: { type: 'number', minimum: 0, maximum: 100 },
scores: {
type: 'object',
properties: {
tests: { type: 'number' },
implementation: { type: 'number' },
codeQuality: { type: 'number' },
security: { type: 'number' },
alignment: { type: 'number' }
}
},
recommendations: { type: 'array', items: { type: 'string' } }
}
}
},
io: {
inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
}
}));- Set Realistic Targets: Aim for achievable quality scores (80-90% is often reasonable)
- Limit Iterations: Prevent runaway loops with sensible limits (5-10 iterations typically)
- Use Parallel Checks: Run independent quality checks concurrently for efficiency
- Provide Feedback: Pass recommendations from scoring to subsequent iterations
- Log Progress: Track quality scores across iterations for visibility
- Include Breakpoints: Add approval gates at key milestones
Symptom:
Iteration 1: Quality 65/100
Iteration 2: Quality 66/100
Iteration 3: Quality 65/100
Iteration 4: Quality 67/100
Iteration 5: Quality 66/100
Target not met: 85/100
Causes:
- Quality target is unrealistic for the codebase
- Scoring criteria are too strict
- Fundamental issues blocking improvement
Solutions:
-
Review iteration feedback to identify blocking issues:
What recommendations came from my quality scoring? -
Adjust quality target:
const { targetQuality = 75 } = inputs; // Lower target
-
Increase iteration limit:
const { maxIterations = 10 } = inputs; // More iterations
-
Review scoring weights for balance
Symptom: Process runs for many iterations before converging.
Cause: Target is too high or improvements are too granular.
Solutions:
-
Implement early exit on plateau:
const recentScores = iterationResults.slice(-3).map(r => r.quality); if (Math.max(...recentScores) - Math.min(...recentScores) < 2) { ctx.log('Quality plateaued, stopping early'); break; }
-
Increase improvement scope per iteration
-
Lower quality target to realistic level
Symptom: Quality scores vary significantly between iterations without clear reason.
Cause: Non-deterministic scoring or external factors.
Solution:
- Use deterministic scoring criteria
- Ensure
ctx.now()is used instead ofDate.now()for timestamps - Review agent scoring prompts for consistency
Symptom: Each iteration takes several minutes.
Cause: Sequential execution of independent tasks.
Solution: Use parallel execution:
// Slow: Sequential
const coverage = await ctx.task(coverageTask, {});
const lint = await ctx.task(lintTask, {});
const security = await ctx.task(securityTask, {});
// Fast: Parallel
const [coverage, lint, security] = await ctx.parallel.all([
() => ctx.task(coverageTask, {}),
() => ctx.task(lintTask, {}),
() => ctx.task(securityTask, {})
]);- Process Definitions - Learn to create quality convergence processes
- Parallel Execution - Optimize quality checks with parallelism
- Breakpoints - Add approval gates to quality convergence workflows
- Best Practices - Patterns for setting targets, custom scoring strategies, and balancing speed vs thoroughness
- Process Library - Browse 2,000+ pre-built processes with quality convergence
- Two-Loops Architecture - Deep dive into the evidence-driven completion model
Babysitter offers two levels of reusable workflows:
Quality convergence works with ANY of Babysitter's 19+ methodologies - not just TDD. Methodologies define your development approach:
| Methodology | Best For | Quality Focus |
|---|---|---|
| TDD Quality Convergence | Test-first development | Test coverage, regression prevention |
| GSD (Get Stuff Done) | Rapid prototyping | Working software, iteration speed |
| Spec-Kit | Enterprise/governance | Specification compliance, audit trails |
| BDD/Specification by Example | Team collaboration | Acceptance criteria, living documentation |
| Domain-Driven Design | Complex business domains | Domain model integrity, bounded contexts |
Browse methodologies:
Beyond methodologies, Babysitter includes 2,000+ domain-specific processes for specific tasks:
| Domain | Processes | Examples |
|---|---|---|
| Development | 680+ | Web APIs, mobile apps, DevOps pipelines |
| Business | 430+ | Legal contracts, HR workflows, marketing campaigns |
| Science & Engineering | 550+ | Quantum algorithms, aerospace systems, biomedical devices |
| Social Sciences | 150+ | Research methodologies, survey analysis |
Browse processes:
- Process Library - Full catalog with descriptions
- Specializations folder
| Your Goal | Next Step |
|---|---|
| Run a quality convergence workflow | Try /babysitter:call build a feature with 85% quality target |
| Build your own convergence loop | Copy the TDD example above and customize the scoring |
| Add more quality gates | See the Five Quality Gate Categories section |
| Debug a stuck convergence | Check Best Practices - Debugging |
| Understand the architecture | Read Two-Loops Architecture |
Quality convergence enables automated iterative improvement until defined quality targets are met. Combine quality scoring, feedback loops, and sensible iteration limits to ensure consistent, high-quality outputs. Use parallel execution for efficiency and breakpoints for human oversight at critical milestones.
Key Takeaways:
- Set realistic targets - Start with 80-85, work up to 90+
- Use multiple gate types - Tests + lint + security + performance
- Pass feedback between iterations - AI learns from each failure
- Detect plateaus early - Don't waste iterations on no improvement
- Parallelize independent checks - Faster iterations mean faster convergence