Skip to content

Commit 4fd4926

Browse files
docs: document v1.41.0 generation tuning + .spectra-debug.log
docs/configuration.md: - New section "ai.generation_timeout_minutes, ai.generation_batch_size, ai.debug_log_enabled" with the full property table - .spectra-debug.log format spec with a real session example showing BATCH START / BATCH OK / BATCH TIMEOUT lines - Two example configs: slow / reasoning model (DeepSeek-V3, 20min × 8) and fast model with debug log silenced - Note about the new multi-line timeout error message and its remediation paths docs/cli-reference.md: - spectra ai generate gains a "Tuning for slow models" callout pointing at the three new ai.* config fields and the .spectra-debug.log file PROJECT-KNOWLEDGE.md: - Config schema example shows the three new ai.* fields - Tuning note for slow / reasoning-class models CLAUDE.md: - Recent Changes entry for the v1.41.0 fix covering the new config fields, the .spectra-debug.log format, the improved timeout error, and the symptom this fixes (DeepSeek-V3 / Azure-large generators hitting the hardcoded 5-minute budget on batches of 30)
1 parent 60e6348 commit 4fd4926

File tree

4 files changed

+93
-1
lines changed

4 files changed

+93
-1
lines changed

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ spectra config list-automation-dirs # List dirs with existence s
209209
- **Tests:** xUnit with structured results (never throw on validation errors)
210210

211211
## Recent Changes
212+
- v1.41.0 fix: configurable per-batch generation timeout + batch size + .spectra-debug.log. The 5-minute SDK timeout in `CopilotGenerationAgent.GenerateTestsAsync` (hardcoded since spec 009) is now configurable via three new optional `ai.*` fields: `generation_timeout_minutes` (default 5, minimum 1), `generation_batch_size` (default 30, ≤0 falls back to default), `debug_log_enabled` (default true). `GenerateHandler` reads `generation_batch_size` from config and renders "Batch N/M" in progress messages. `CopilotGenerationAgent` writes timestamped per-batch lines to `.spectra-debug.log` in the project root: `BATCH START requested=N model=M provider=P timeout=Tmin`, `BATCH OK requested=N elapsed=Ts`, `BATCH TIMEOUT requested=N model=M configured_timeout=Tmin`. Best-effort writes — never throws or blocks generation. Improved timeout error message names the actual model, batch size, and configured minutes, and lists three remediation paths (bump timeout, shrink batch, reduce --count) with copy-paste-ready spectra.config.json snippets, plus a pointer to `.spectra-debug.log`. Live status messages now include batch number and total ("Batch 3/13") plus the configured timeout. Symptom this fixes: with slower / reasoning-class models (DeepSeek-V3, GPT-4 Turbo with long contexts, large Azure Anthropic deployments), batches of 30 tests routinely exceeded the hardcoded 5-minute budget — `--count 100` would time out on the first batch even though `--count 1` worked. Workaround for users hitting the timeout is now config-only — no code change required. Documentation: `docs/configuration.md` gains a full `ai.generation_timeout_minutes` / `ai.generation_batch_size` / `ai.debug_log_enabled` section with example slow-model and fast-model configs and the `.spectra-debug.log` format spec; `PROJECT-KNOWLEDGE.md` config schema example shows the new fields with a tuning note. All 1551 tests still pass.
212213
- 039-unify-critic-providers: ✅ COMPLETE - Aligned the critic provider validation set with the generator provider list. `CriticFactory.SupportedProviders` now contains exactly the canonical 5 (`github-models`, `azure-openai`, `azure-anthropic`, `openai`, `anthropic`) — same as `ai.providers[].name`. Removed `azure-deepseek` (not in the generator set) and `google` (Copilot SDK cannot route to it). New private `LegacyAliases` dictionary maps `github` → `github-models` (soft alias with one-line stderr deprecation warning); new private `HardErrorProviders` set contains `google` and produces a hard validation error listing all 5 supported providers. New `internal static ResolveProvider(string?)` helper centralizes the lookup; called from both `TryCreate` and `TryCreateAsync` (in the async path BEFORE the Copilot availability check so unknown providers fail fast). New public `DefaultProvider = "github-models"` constant; empty/whitespace input falls back to it. Case-insensitive matching with lowercase normalization. `IsSupported` returns true for canonical names AND legacy `github`, false for `google`/unknown. `CriticConfig.Provider` XML doc updated to list the canonical 5; `GetEffectiveModel()` and `GetDefaultApiKeyEnv()` switch statements gain explicit cases for `github-models`, `azure-openai`, `azure-anthropic` (defaults: `gpt-4o-mini`/`gpt-4o-mini`/`claude-3-5-haiku-latest` and `GITHUB_TOKEN`/`AZURE_OPENAI_API_KEY`/`AZURE_ANTHROPIC_API_KEY`); legacy `github`/`google` cases retained for read-side safety. Existing `CriticFactoryTests`, `CriticAuthTests`, `VerificationIntegrationTests`, `CriticConfigTests` updated to drop `google` from "supported" assertions and add `azure-openai`/`azure-anthropic` to the canonical theory data. Docs updated: `docs/configuration.md` lists the canonical 5 with two new examples (Azure-only billing, cross-provider critic); `docs/grounding-verification.md` example uses `github-models` instead of `google`, supported list updated, legacy-values note added. New `CriticFactoryProviderTests.cs` with 8 tests (azure-openai accept, azure-anthropic accept, case-insensitive normalization, github→github-models alias with stderr warning, google hard error with 5-provider listing, unknown provider error, empty fallback, whitespace fallback). Backward compatible: all existing configurations on `openai`/`anthropic`/`github-models` continue to work unchanged. 1551 total tests passing (491 Core + 709 CLI + 351 MCP).
213214
- 038-testimize-integration: ✅ COMPLETE - Optional integration with the external Testimize.MCP.Server tool for algorithmic test data optimization (BVA, EP, pairwise, ABC). Disabled by default; zero impact on users who do not opt in. New `testimize` section in `spectra.config.json` (`enabled` defaults to `false`, `mode`, `strategy`, optional `settings_file`, `mcp.command`/`args`, optional `abc_settings`). New models in `src/Spectra.Core/Models/Config/TestimizeConfig.cs` (`TestimizeConfig`, `TestimizeMcpConfig`, `TestimizeAbcSettings`). New `SpectraConfig.Testimize` property defaulting to `new()`. New `src/Spectra.CLI/Agent/Testimize/TestimizeMcpClient.cs` — `IAsyncDisposable` wrapper around the Testimize child process with stdio JSON-RPC framing (`Content-Length` header), `StartAsync` (returns false on missing tool, never throws), `IsHealthyAsync` (5s budget), `CallToolAsync` (30s budget, returns null on any failure), idempotent `DisposeAsync` (kills child process tree). New `src/Spectra.CLI/Agent/Testimize/TestimizeTools.cs` with two `AIFunction` factories: `CreateGenerateTestDataTool` (forwards to Testimize MCP `generate_hybrid_test_cases`/`generate_pairwise_test_cases`/`generate_combinatorial_test_cases` based on strategy) and `CreateAnalyzeFieldSpecTool` (local heuristic extractor for "X to Y characters", "between X and Y", "valid email", "required field"). Tool description explicitly mentions "boundary values", "equivalence classes", and "pairwise" so the model knows when to call them. New `TestimizeDetector.IsInstalledAsync` shells out to `dotnet tool list -g` (5s timeout, never throws). `CopilotGenerationAgent.GenerateTestsAsync` conditionally registers the Testimize tools after the existing 7 tools when `_config.Testimize.Enabled` and the MCP client passes a health probe; the client is disposed in a `finally` block on every exit path (success, exception, cancellation). `BehaviorAnalyzer.BuildAnalysisPrompt` and `GenerationAgent.BuildFullPrompt` populate a new `testimize_enabled` placeholder (`"true"` or `""`). `behavior-analysis.md` and `test-generation.md` templates gain `{{#if testimize_enabled}}` blocks with technique-aware instructions. New `spectra testimize check` CLI command (`src/Spectra.CLI/Commands/Testimize/TestimizeCheckHandler.cs` + `TestimizeCommand.cs`) reports enabled/installed/healthy/mode/strategy/settings status, supports `--output-format json`, never starts the MCP process when disabled (FR-028), includes install instruction when not installed. New `TestimizeCheckResult` typed result. Embedded `Templates/spectra.config.json` updated to include the `testimize` section with `enabled: false` so `spectra init` writes it by default. Existing `spectra init` already refuses to overwrite an existing config without `--force`, so re-init preservation is automatic. Graceful degradation everywhere: tool not installed → warning + AI fallback; process crashes mid-generation → catch + AI fallback; server returns null/empty/garbage → AI fallback; per-call 30s timeout → AI fallback. Exit code 0 in all degradation paths. New tests: `TestimizeConfigTests` (6), `TestimizeMcpClientTests` (4), `TestimizeMcpClientGracefulTests` (4), `TestimizeToolsTests` (2), `AnalyzeFieldSpecTests` (7), `TestimizeDetectorTests` (1), `TestimizeConditionalBlockTests` (4), `TestimizeCheckHandlerTests` (3) — 31 net new tests (488 Core + 699 CLI + 351 MCP = 1538 total passing).
214215
- 037-istqb-test-techniques: ✅ COMPLETE - ISTQB test design techniques in prompt templates. All five built-in prompt templates (`src/Spectra.CLI/Prompts/Content/*.md`) updated to teach the AI six ISTQB black-box techniques: Equivalence Partitioning (EP), Boundary Value Analysis (BVA), Decision Table (DT), State Transition (ST), Error Guessing (EG), Use Case (UC). `behavior-analysis.md` rewritten with six TECHNIQUE sections, distribution guidelines (40%-of-any-category cap, ≥4 BVA per range, mandatory invalid state transitions), and a new `technique` field in the JSON output schema. `test-generation.md` adds a "TEST DESIGN TECHNIQUE RULES" section with WRONG/RIGHT examples for BVA exact values, EP class naming, DT condition values, ST starting/action/resulting state, and EG concrete error scenarios. `test-update.md` adds a "Technique Completeness Check" so updates flag tests as OUTDATED when docs introduce new ranges/rules/states. `critic-verification.md` adds a "Technique Verification" section returning PARTIAL for BVA boundary mismatches and HALLUCINATED for DT conditions not in docs. `criteria-extraction.md` adds optional `technique_hint` per criterion. New `IdentifiedBehavior.Technique` string field (default `""` for backward compatibility), new `BehaviorAnalysisResult.TechniqueBreakdown` map (excludes empty techniques), new `AcceptanceCriterion.TechniqueHint` `string?` (omitted from YAML when null). `BehaviorAnalyzer.BuildAnalysisPrompt` legacy fallback updated with condensed ISTQB instructions. `GenerateAnalysis.TechniqueBreakdown` always serialized as `{}` when empty for stable SKILL/CI contract. `AnalysisPresenter` and `ProgressPageWriter` render a "Technique Breakdown" section beneath the existing category breakdown in fixed display order BVA, EP, DT, ST, EG, UC. `CriteriaExtractor` parses and normalizes the technique hint; `LoadCriteriaContextAsync` forwards `technique_hint` into the generation prompt. `BuildPrompt` reinforces ISTQB technique application in the per-batch user prompt. `spectra-generate` SKILL shows both category and technique breakdowns from analysis JSON. `spectra-help` SKILL gains an "ISTQB Test Design Techniques" section pointing users to `spectra prompts reset --all`. `spectra-quickstart` SKILL flagged as updated with the new techniques. `spectra-generation` agent prompt notes the technique breakdown. Existing user-customized `.spectra/prompts/*.md` files are NOT auto-overwritten (spec 022 hash-tracking preserved); users opt in via `spectra prompts reset` or `spectra update-skills`. New tests: `BehaviorAnalysisTemplateTests` (6), `BehaviorAnalyzerTechniqueTests` (3), `BehaviorAnalyzerLegacyFallbackTests` (3), `BehaviorAnalysisResultTechniqueTests` (2), `GenerateResultTechniqueBreakdownTests` (3), `ProgressPageTechniqueBreakdownTests` (3), `TestGenerationTemplateTests` (6), `PromptTemplateMigrationTests` (2), `TestUpdateTemplateTests` (5), `CriticVerificationTemplateTests` (5), `CriteriaExtractionTemplateTests` (6), `AcceptanceCriterionTechniqueHintTests` (4). One existing test (`BuildPrompt_WithoutLoader_UsesLegacy`) updated to assert on the new `boundary` and `error_handling` categories (replacing the dropped `performance` assertion). 48 new tests. 1507 total tests passing (482 Core + 674 CLI + 351 MCP).

PROJECT-KNOWLEDGE.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,10 @@ Three-section unified coverage with distinct semantics:
285285
"tests": { "dir": "tests/", "id_prefix": "TC", "id_start": 100 },
286286
"ai": {
287287
"providers": [{ "name": "azure-anthropic", "model": "claude-sonnet-4-5", "enabled": true }],
288-
"critic": { "provider": "azure-anthropic", "model": "claude-sonnet-4-5" }
288+
"critic": { "provider": "azure-anthropic", "model": "claude-sonnet-4-5" },
289+
"generation_timeout_minutes": 5,
290+
"generation_batch_size": 30,
291+
"debug_log_enabled": true
289292
},
290293
"coverage": {
291294
"criteria_file": "docs/criteria/_criteria_index.yaml",
@@ -301,6 +304,14 @@ Three-section unified coverage with distinct semantics:
301304
}
302305
```
303306

307+
> **Generation tuning (v1.41.0)**: `ai.generation_timeout_minutes` (default 5)
308+
> and `ai.generation_batch_size` (default 30) are the per-batch knobs for
309+
> `spectra ai generate`. Slower / reasoning models (DeepSeek-V3, large Azure
310+
> deployments) typically need 15–20 min and batches of 6–10.
311+
> `ai.debug_log_enabled` (default true) writes per-batch timing diagnostics to
312+
> `.spectra-debug.log` in the project root — inspect this file to dial the
313+
> tuning knobs to your model's actual throughput.
314+
304315
> **Spec 039**: critic providers now use the same five names as the generator
305316
> (`github-models`, `azure-openai`, `azure-anthropic`, `openai`, `anthropic`).
306317
> Legacy `github` is a soft alias; legacy `google` is rejected.

docs/cli-reference.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,8 @@ Duplicate detection warns when a new test has >80% title similarity to an existi
196196

197197
**Exit codes:** `0` = success, `1` = error, `3` = missing required args with `--no-interaction`.
198198

199+
**Tuning for slow models** (v1.41.0+): generation runs in batches of `ai.generation_batch_size` tests (default 30), each subject to `ai.generation_timeout_minutes` (default 5). Slower / reasoning-class models (DeepSeek-V3, large Azure deployments, GPT-4 Turbo with long contexts) typically need `generation_timeout_minutes: 15–20` and `generation_batch_size: 6–10`. When `ai.debug_log_enabled` is true (the default), each batch writes a timestamped `BATCH START` / `BATCH OK` / `BATCH TIMEOUT` line to `.spectra-debug.log` in the project root with the model, provider, requested count, and elapsed seconds — inspect this file to dial the knobs precisely. See [Configuration → ai.generation_timeout_minutes](configuration.md) for the full reference, example configs, and the timeout error message format.
200+
199201
### `spectra ai update`
200202

201203
Update existing tests when documentation changes. Classifies tests as UP_TO_DATE, OUTDATED, ORPHANED, or REDUNDANT.

docs/configuration.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,84 @@ SPECTRA is configured via `spectra.config.json` at the repository root. Run `spe
107107

108108
All providers are accessed through the GitHub Copilot SDK.
109109

110+
### `ai.generation_timeout_minutes`, `ai.generation_batch_size`, `ai.debug_log_enabled`
111+
112+
Per-batch tuning for `spectra ai generate` and an append-only diagnostic log.
113+
These knobs are needed for slower / reasoning-class models (DeepSeek-V3, large
114+
Azure Anthropic deployments, GPT-4 Turbo with long contexts) where the default
115+
5-minute / 30-tests-per-batch budget is too small.
116+
117+
| Property | Type | Default | Description |
118+
|----------|------|---------|-------------|
119+
| `generation_timeout_minutes` | int | `5` | Per-batch SDK call timeout. The timer measures the entire batch round-trip including all tool calls the AI makes. Slower models may need 10–20+ minutes. Minimum effective value is 1. |
120+
| `generation_batch_size` | int | `30` | Number of tests requested per AI call. Smaller batches reduce per-batch latency on slow models at the cost of more total round-trips. Pair with `generation_timeout_minutes`. Values ≤ 0 fall back to the default. |
121+
| `debug_log_enabled` | bool | `true` | When true, appends per-batch diagnostics to `.spectra-debug.log` in the project root. Best-effort writes; never blocks generation. Set to `false` to silence. |
122+
123+
#### `.spectra-debug.log` format
124+
125+
When `debug_log_enabled` is true, every batch writes one or more timestamped
126+
lines to `.spectra-debug.log`. Example session:
127+
128+
```text
129+
2026-04-11T01:30:14.221Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
130+
2026-04-11T01:34:51.778Z [generate] BATCH OK requested=8 elapsed=277.6s
131+
2026-04-11T01:34:53.014Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
132+
2026-04-11T01:39:10.004Z [generate] BATCH OK requested=8 elapsed=257.0s
133+
2026-04-11T01:39:11.221Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
134+
2026-04-11T01:59:11.330Z [generate] BATCH TIMEOUT requested=8 model=DeepSeek-V3.2 configured_timeout=20min
135+
```
136+
137+
Lines starting with `BATCH START` mark the beginning of an AI call (model,
138+
provider, batch size, configured timeout). `BATCH OK` marks success and
139+
records the elapsed wall time. `BATCH TIMEOUT` marks a timeout failure with
140+
the configured budget for cross-reference. Use this to dial
141+
`generation_timeout_minutes` and `generation_batch_size` precisely to your
142+
model's actual throughput.
143+
144+
The file is **append-only** — it grows over time. Delete it when stale.
145+
146+
In addition, the existing `.spectra-debug-response.txt` file (always written,
147+
not gated by this flag) contains the raw text of the most recent AI response
148+
for the current batch. Useful when the AI returns a malformed JSON array.
149+
150+
#### Example: configuring for a slow / reasoning model
151+
152+
```json
153+
{
154+
"ai": {
155+
"providers": [
156+
{ "name": "azure-openai", "model": "DeepSeek-V3.2", "enabled": true }
157+
],
158+
"generation_timeout_minutes": 20,
159+
"generation_batch_size": 8,
160+
"debug_log_enabled": true
161+
}
162+
}
163+
```
164+
165+
With this config, `spectra ai generate --count 100` splits into 13 batches
166+
of 8 tests each, every batch has a 20-minute budget, and each batch's actual
167+
elapsed time is logged so you can re-tune later.
168+
169+
#### Example: silencing the debug log on a fast model
170+
171+
```json
172+
{
173+
"ai": {
174+
"providers": [{ "name": "github-models", "model": "gpt-4o", "enabled": true }],
175+
"debug_log_enabled": false
176+
}
177+
}
178+
```
179+
180+
#### Timeout error message
181+
182+
When a batch hits the configured budget, the generation result includes a
183+
multi-line error message that names the model, the batch size, the configured
184+
timeout, and three remediation paths (bump the timeout, shrink the batch, or
185+
reduce `--count`). The same message points at `.spectra-debug.log` for
186+
per-batch timing context.
187+
110188
### `ai.critic`
111189

112190
Grounding verification configuration. See [Grounding Verification](grounding-verification.md).

0 commit comments

Comments
 (0)