@@ -17,7 +17,7 @@ Related: [Configuration](configuration.md) | [Grounding Verification](grounding-
1717SPECTRA uses two AI models per run: a ** generator** (behavior analysis + test
1818creation) and a ** critic** (grounding verification). Choosing the right
1919combination determines quality, speed, and cost. This guide is based on real
20- production data from April 11, 2026 .
20+ production data — 1,261 test cases generated across 7 suites for $0.00 .
2121
2222---
2323
@@ -89,7 +89,7 @@ verification, zero critic cost.
8989```
9090
9191Cost per ` --count 20 ` run: ~ 4 PRs (analysis + generation batches). Critic is
92- free. ~ 26,000 tests/month on Pro+ .
92+ free. Real-world result: 1,261 tests across 7 suites = 75 PRs total .
9393
9494### Preset 2: Zero Cost
9595
@@ -139,45 +139,56 @@ Cost per `--count 20` run: ~7 PRs (critic only). Generation is free.
139139
140140## Real Production Run Data
141141
142- Actual results from April 11, 2026. Generator: Claude Sonnet 4.5. Critic:
143- GPT-4.1. Both via ` github-models ` provider on Copilot Pro+.
142+ Actual results from a full production run. Generator: Claude Sonnet 4.5. Critic:
143+ GPT-4.1 (parallel, ` max_concurrent: 5 ` ). Both via ` github-models ` provider
144+ on Copilot Pro+. Some suites were regenerated multiple times during testing.
144145
145146### Run Results
146147
147- | Suite | Tests | Batches | Gen Time | Critic Time | Total | PRs Used |
148- | -------| -------| ---------| ----------| -------------| -------| ----------|
149- | Standard Calculator | 238 | 12 | 22m26s | 23m02s | 46m19s | 13 |
150- | Unit Converter | 178 (163 written, 15 rejected) | 9 | 18m03s | 17m43s | 36m25s | 10 |
151- | ** Total** | ** 416** | ** 21** | ** 40m29s** | ** 40m45s** | ** 82m44s** | ** 24** |
148+ | Suite | Tests Generated | Gen Time | Critic Time | Total | PRs Used |
149+ | -------| ----------------| ----------| -------------| -------| ----------|
150+ | Standard Calculator | 238 | 22m26s | 23m02s | 46m19s | 13 |
151+ | Unit Converter | 181 | 18m34s | 17m58s | 37m20s | 11 |
152+ | Date Calculation | 398 (2 runs) | 36m07s | 43m08s | 47m49s | 23 |
153+ | General App Features | 100 | 12m49s | 10m22s | 23m37s | 7 |
154+ | Scientific Calculator | 135 | 11m31s | 13m19s | 18m15s | 8 |
155+ | Programmer Calculator | 117 | 12m20s | 14m57s | 16m02s | 7 |
156+ | Graphing Calculator | 92 | 11m06s | 10m08s | 13m44s | 6 |
157+ | ** Total** | ** 1,261** | ** ~ 2h05m** | ** ~ 2h13m** | ** ~ 3h23m** | ** ~ 75** |
152158
153159### Token Consumption
154160
155161| Suite | Input Tokens | Output Tokens | Total |
156162| -------| -------------| --------------| -------|
157163| Standard Calculator | 5,898,939 | 184,274 | 6,083,213 |
158- | Unit Converter | 3,940,480 | 162,342 | 4,102,822 |
159- | ** Total** | ** 9,839,419** | ** 346,616** | ** 10,186,035** |
164+ | Unit Converter | 4,157,801 | 164,090 | 4,321,891 |
165+ | Date Calculation | 9,191,320 | 341,179 | 9,532,499 |
166+ | General App Features | 2,447,233 | 101,976 | 2,549,209 |
167+ | Scientific Calculator | 3,319,320 | 101,543 | 3,420,863 |
168+ | Programmer Calculator | 2,819,811 | 111,662 | 2,931,473 |
169+ | Graphing Calculator | 2,376,626 | 85,621 | 2,462,247 |
170+ | ** Total** | ** 30,211,050** | ** 1,090,345** | ** 31,301,395** |
160171
161172### Per-Phase Timing
162173
163174| Phase | Avg per call | Notes |
164175| -------| -------------| -------|
165176| Analysis (Sonnet) | 25–148s | Varies by doc complexity. Sonnet finds 200+ behaviors; GPT-4.1 finds ~ 40 |
166177| Generation batch (Sonnet, 20 tests) | ~ 110s | ~ 5.5s per test |
167- | Critic call (GPT-4.1) | ~ 5.5s | Sequential; parallelizable to ~ 1s with ` max_concurrent: 5 ` |
178+ | Critic call (GPT-4.1, parallel ×5 ) | ~ 6s per call, ~ 1.2s effective | 5 concurrent calls reduces wall time by ~ 80% |
168179
169180---
170181
171182## Cost Comparison
172183
173- ### Same workload: 416 tests, April 11, 2026
184+ ### Full workload: 1,261 tests across 7 suites
174185
175186| Provider | Input Cost | Output Cost | Total |
176187| ----------| -----------| -------------| -------|
177- | ** Copilot Pro+ (github-models)** | included | included | ** $0.00** (24 of 1,500 PRs) |
178- | Copilot Pro overage ($0.04/PR) | — | — | ** $0.96 ** |
179- | Azure AI Foundry (Sonnet 4.5) | $29.52 | $5.20 | ** $34.72 ** |
180- | Anthropic API direct | $29.52 | $5.20 | ** $34.72 ** |
188+ | ** Copilot Pro+ (github-models)** | included | included | ** $0.00** (~ 75 of 1,500 PRs) |
189+ | Copilot Pro overage ($0.04/PR) | — | — | ** $3.00 ** |
190+ | Azure AI Foundry (Sonnet 4.5) | $90.63 | $16.36 | ** $106.99 ** |
191+ | Anthropic API direct | $90.63 | $16.36 | ** $106.99 ** |
181192
182193### Full monthly capacity at Pro+ (1,500 PRs)
183194
@@ -187,6 +198,18 @@ GPT-4.1. Both via `github-models` provider on Copilot Pro+.
187198| Azure AI Foundry equivalent | ** ~ $2,169** |
188199| Copilot overage equivalent | ** $60** (1,500 × $0.04) |
189200
201+ ### Premium Request Budget
202+
203+ After generating 1,261 tests across all 7 suites (within a single billing cycle):
204+
205+ | Metric | Value |
206+ | --------| -------|
207+ | PRs consumed (total account) | 191.52 of 1,500 |
208+ | PRs from SPECTRA runs | ~ 75 (Sonnet generation + analysis only) |
209+ | PRs from VS Code / other usage | ~ 116 |
210+ | PRs remaining | 1,308 (19 days left in cycle) |
211+ | Billed amount | $0.00 |
212+
190213> The 55× price difference between Copilot Pro+ and Azure pay-per-token exists
191214> because Copilot is a subscription model — Microsoft subsidizes heavy users
192215> with revenue from lighter users. SPECTRA's workload (hundreds of structured
0 commit comments