AutomateThePlanet
diff --git a/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions b/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎PROJECT-KNOWLEDGE.md‎
Lines changed: 16 additions & 7 deletions b/‎PROJECT-KNOWLEDGE.md‎
Lines changed: 16 additions & 7 deletions
diff --git a/‎docs/configuration.md‎
Lines changed: 25 additions & 11 deletions b/‎docs/configuration.md‎
Lines changed: 25 additions & 11 deletions
@@ -209,6 +209,7 @@ spectra config list-automation-dirs                 # List dirs with existence s
 - **Tests:** xUnit with structured results (never throw on validation errors)
 
 ## Recent Changes
+- v1.42.0 fix: configurable behavior analysis timeout + analysis_failed status. Spec post-038/041 fix uncovered a SECOND hardcoded timeout (BehaviorAnalyzer used 2 min, also since spec 009) plus a UX bug where failed analysis was reported as `status: "analyzed"` with a fake `recommended` count of 15 (the `Generation.DefaultCount` fallback). Symptom on slow / reasoning models (DeepSeek-V3): the chat agent confidently presents "I recommend generating 15 test cases" while `behaviors_found: 0` — looks like a successful run but the analysis silently timed out. New `ai.analysis_timeout_minutes` config field (default 2, minimum 1) controls the BehaviorAnalyzer SDK call timeout. `BehaviorAnalyzer.AnalyzeAsync` reads it from `_config.Ai.AnalysisTimeoutMinutes`, surfaces the configured budget in the live status message, and writes timestamped `[analyze]` lines to the same `.spectra-debug.log` as the generator: `ANALYSIS START documents=N model=M provider=P timeout=Tmin`, `ANALYSIS OK behaviors=N response_chars=N elapsed=Ts`, `ANALYSIS TIMEOUT model=M configured_timeout=Tmin elapsed=Ts`, `ANALYSIS PARSE_FAIL`, `ANALYSIS EMPTY`, `ANALYSIS ERROR exception=T message=...`. Improved timeout status message names the model and points users at the new config field. `GenerateHandler` analyze-only path now sets `Status = "analysis_failed"` (not `"analyzed"`) when `analysisResult` is null, and populates `Message` with a multi-line explanation: model name, the fact that `Recommended` is a fallback default not a real analysis, the common cause (`ai.analysis_timeout_minutes` too low), the remediation (bump to 5–10 min), and a pointer to `.spectra-debug.log`. Also fixed a pre-existing spec 037 oversight: the analyze-only result construction was missing `TechniqueBreakdown` (the spec 037 `replace_all` matched 16-space indentation but the analyze-only block uses 20 spaces) — now populated. `spectra-generate` SKILL Step 4 gains an `analysis_failed` case that tells the agent NOT to show a recommendation, show the `message` verbatim, and stop for user confirmation before proceeding. Documentation: `docs/configuration.md` documents `analysis_timeout_minutes` alongside the existing tuning knobs, includes a `[analyze] ANALYSIS START/OK/TIMEOUT` example in the `.spectra-debug.log` format spec, and explains the new `analysis_failed` status. `PROJECT-KNOWLEDGE.md` config example shows `analysis_timeout_minutes: 2`. All 1551 tests still pass (no test changes — additive config field + null-result handling).
 - v1.41.0 fix: configurable per-batch generation timeout + batch size + .spectra-debug.log. The 5-minute SDK timeout in `CopilotGenerationAgent.GenerateTestsAsync` (hardcoded since spec 009) is now configurable via three new optional `ai.*` fields: `generation_timeout_minutes` (default 5, minimum 1), `generation_batch_size` (default 30, ≤0 falls back to default), `debug_log_enabled` (default true). `GenerateHandler` reads `generation_batch_size` from config and renders "Batch N/M" in progress messages. `CopilotGenerationAgent` writes timestamped per-batch lines to `.spectra-debug.log` in the project root: `BATCH START requested=N model=M provider=P timeout=Tmin`, `BATCH OK requested=N elapsed=Ts`, `BATCH TIMEOUT requested=N model=M configured_timeout=Tmin`. Best-effort writes — never throws or blocks generation. Improved timeout error message names the actual model, batch size, and configured minutes, and lists three remediation paths (bump timeout, shrink batch, reduce --count) with copy-paste-ready spectra.config.json snippets, plus a pointer to `.spectra-debug.log`. Live status messages now include batch number and total ("Batch 3/13") plus the configured timeout. Symptom this fixes: with slower / reasoning-class models (DeepSeek-V3, GPT-4 Turbo with long contexts, large Azure Anthropic deployments), batches of 30 tests routinely exceeded the hardcoded 5-minute budget — `--count 100` would time out on the first batch even though `--count 1` worked. Workaround for users hitting the timeout is now config-only — no code change required. Documentation: `docs/configuration.md` gains a full `ai.generation_timeout_minutes` / `ai.generation_batch_size` / `ai.debug_log_enabled` section with example slow-model and fast-model configs and the `.spectra-debug.log` format spec; `PROJECT-KNOWLEDGE.md` config schema example shows the new fields with a tuning note. All 1551 tests still pass.
 - 039-unify-critic-providers: ✅ COMPLETE - Aligned the critic provider validation set with the generator provider list. `CriticFactory.SupportedProviders` now contains exactly the canonical 5 (`github-models`, `azure-openai`, `azure-anthropic`, `openai`, `anthropic`) — same as `ai.providers[].name`. Removed `azure-deepseek` (not in the generator set) and `google` (Copilot SDK cannot route to it). New private `LegacyAliases` dictionary maps `github` → `github-models` (soft alias with one-line stderr deprecation warning); new private `HardErrorProviders` set contains `google` and produces a hard validation error listing all 5 supported providers. New `internal static ResolveProvider(string?)` helper centralizes the lookup; called from both `TryCreate` and `TryCreateAsync` (in the async path BEFORE the Copilot availability check so unknown providers fail fast). New public `DefaultProvider = "github-models"` constant; empty/whitespace input falls back to it. Case-insensitive matching with lowercase normalization. `IsSupported` returns true for canonical names AND legacy `github`, false for `google`/unknown. `CriticConfig.Provider` XML doc updated to list the canonical 5; `GetEffectiveModel()` and `GetDefaultApiKeyEnv()` switch statements gain explicit cases for `github-models`, `azure-openai`, `azure-anthropic` (defaults: `gpt-4o-mini`/`gpt-4o-mini`/`claude-3-5-haiku-latest` and `GITHUB_TOKEN`/`AZURE_OPENAI_API_KEY`/`AZURE_ANTHROPIC_API_KEY`); legacy `github`/`google` cases retained for read-side safety. Existing `CriticFactoryTests`, `CriticAuthTests`, `VerificationIntegrationTests`, `CriticConfigTests` updated to drop `google` from "supported" assertions and add `azure-openai`/`azure-anthropic` to the canonical theory data. Docs updated: `docs/configuration.md` lists the canonical 5 with two new examples (Azure-only billing, cross-provider critic); `docs/grounding-verification.md` example uses `github-models` instead of `google`, supported list updated, legacy-values note added. New `CriticFactoryProviderTests.cs` with 8 tests (azure-openai accept, azure-anthropic accept, case-insensitive normalization, github→github-models alias with stderr warning, google hard error with 5-provider listing, unknown provider error, empty fallback, whitespace fallback). Backward compatible: all existing configurations on `openai`/`anthropic`/`github-models` continue to work unchanged. 1551 total tests passing (491 Core + 709 CLI + 351 MCP).
 - 038-testimize-integration: ✅ COMPLETE - Optional integration with the external Testimize.MCP.Server tool for algorithmic test data optimization (BVA, EP, pairwise, ABC). Disabled by default; zero impact on users who do not opt in. New `testimize` section in `spectra.config.json` (`enabled` defaults to `false`, `mode`, `strategy`, optional `settings_file`, `mcp.command`/`args`, optional `abc_settings`). New models in `src/Spectra.Core/Models/Config/TestimizeConfig.cs` (`TestimizeConfig`, `TestimizeMcpConfig`, `TestimizeAbcSettings`). New `SpectraConfig.Testimize` property defaulting to `new()`. New `src/Spectra.CLI/Agent/Testimize/TestimizeMcpClient.cs` — `IAsyncDisposable` wrapper around the Testimize child process with stdio JSON-RPC framing (`Content-Length` header), `StartAsync` (returns false on missing tool, never throws), `IsHealthyAsync` (5s budget), `CallToolAsync` (30s budget, returns null on any failure), idempotent `DisposeAsync` (kills child process tree). New `src/Spectra.CLI/Agent/Testimize/TestimizeTools.cs` with two `AIFunction` factories: `CreateGenerateTestDataTool` (forwards to Testimize MCP `generate_hybrid_test_cases`/`generate_pairwise_test_cases`/`generate_combinatorial_test_cases` based on strategy) and `CreateAnalyzeFieldSpecTool` (local heuristic extractor for "X to Y characters", "between X and Y", "valid email", "required field"). Tool description explicitly mentions "boundary values", "equivalence classes", and "pairwise" so the model knows when to call them. New `TestimizeDetector.IsInstalledAsync` shells out to `dotnet tool list -g` (5s timeout, never throws). `CopilotGenerationAgent.GenerateTestsAsync` conditionally registers the Testimize tools after the existing 7 tools when `_config.Testimize.Enabled` and the MCP client passes a health probe; the client is disposed in a `finally` block on every exit path (success, exception, cancellation). `BehaviorAnalyzer.BuildAnalysisPrompt` and `GenerationAgent.BuildFullPrompt` populate a new `testimize_enabled` placeholder (`"true"` or `""`). `behavior-analysis.md` and `test-generation.md` templates gain `{{#if testimize_enabled}}` blocks with technique-aware instructions. New `spectra testimize check` CLI command (`src/Spectra.CLI/Commands/Testimize/TestimizeCheckHandler.cs` + `TestimizeCommand.cs`) reports enabled/installed/healthy/mode/strategy/settings status, supports `--output-format json`, never starts the MCP process when disabled (FR-028), includes install instruction when not installed. New `TestimizeCheckResult` typed result. Embedded `Templates/spectra.config.json` updated to include the `testimize` section with `enabled: false` so `spectra init` writes it by default. Existing `spectra init` already refuses to overwrite an existing config without `--force`, so re-init preservation is automatic. Graceful degradation everywhere: tool not installed → warning + AI fallback; process crashes mid-generation → catch + AI fallback; server returns null/empty/garbage → AI fallback; per-call 30s timeout → AI fallback. Exit code 0 in all degradation paths. New tests: `TestimizeConfigTests` (6), `TestimizeMcpClientTests` (4), `TestimizeMcpClientGracefulTests` (4), `TestimizeToolsTests` (2), `AnalyzeFieldSpecTests` (7), `TestimizeDetectorTests` (1), `TestimizeConditionalBlockTests` (4), `TestimizeCheckHandlerTests` (3) — 31 net new tests (488 Core + 699 CLI + 351 MCP = 1538 total passing).
 
@@ -286,6 +286,7 @@ Three-section unified coverage with distinct semantics:
   "ai": {
     "providers": [{ "name": "azure-anthropic", "model": "claude-sonnet-4-5", "enabled": true }],
     "critic": { "provider": "azure-anthropic", "model": "claude-sonnet-4-5" },
+    "analysis_timeout_minutes": 2,
     "generation_timeout_minutes": 5,
     "generation_batch_size": 30,
     "debug_log_enabled": true
@@ -304,13 +305,21 @@ Three-section unified coverage with distinct semantics:
 }
 ```
 
-> **Generation tuning (v1.41.0)**: `ai.generation_timeout_minutes` (default 5)
-> and `ai.generation_batch_size` (default 30) are the per-batch knobs for
-> `spectra ai generate`. Slower / reasoning models (DeepSeek-V3, large Azure
-> deployments) typically need 15–20 min and batches of 6–10.
-> `ai.debug_log_enabled` (default true) writes per-batch timing diagnostics to
-> `.spectra-debug.log` in the project root — inspect this file to dial the
-> tuning knobs to your model's actual throughput.
+> **Generation & analysis tuning (v1.42.0)**:
+> - `ai.analysis_timeout_minutes` (default 2) — behavior analysis timeout
+> - `ai.generation_timeout_minutes` (default 5) — per-batch generation timeout
+> - `ai.generation_batch_size` (default 30) — tests per AI call
+> - `ai.debug_log_enabled` (default true) — writes per-call timing diagnostics
+>   to `.spectra-debug.log` in the project root
+>
+> Slower / reasoning models (DeepSeek-V3, large Azure deployments) typically
+> need: analysis 5–10 min, generation 15–20 min, batches of 6–10.
+>
+> When behavior analysis fails (timeout, parse error, empty response), the
+> `.spectra-result.json` from `--analyze-only` now sets `status: "analysis_failed"`
+> and a `message` field explaining the cause + remediation. The `spectra-generate`
+> SKILL recognizes this and refuses to present the fallback default (15) as if
+> it were a real recommendation.
 
 > **Spec 039**: critic providers now use the same five names as the generator
 > (`github-models`, `azure-openai`, `azure-anthropic`, `openai`, `anthropic`).
 
@@ -116,24 +116,35 @@ Azure Anthropic deployments, GPT-4 Turbo with long contexts) where the default
 
 | Property | Type | Default | Description |
 |----------|------|---------|-------------|
-| `generation_timeout_minutes` | int | `5` | Per-batch SDK call timeout. The timer measures the entire batch round-trip including all tool calls the AI makes. Slower models may need 10–20+ minutes. Minimum effective value is 1. |
+| `analysis_timeout_minutes` | int | `2` | Timeout for the **behavior analysis** SDK call (the analyze step that runs before generation). Slower models routinely overshoot 2 minutes when scanning a multi-document suite — bump to 5–10 minutes for those. The same timer applies to the retry attempt. |
+| `generation_timeout_minutes` | int | `5` | Per-batch **generation** SDK call timeout. The timer measures the entire batch round-trip including all tool calls the AI makes. Slower models may need 10–20+ minutes. Minimum effective value is 1. |
 | `generation_batch_size` | int | `30` | Number of tests requested per AI call. Smaller batches reduce per-batch latency on slow models at the cost of more total round-trips. Pair with `generation_timeout_minutes`. Values ≤ 0 fall back to the default. |
-| `debug_log_enabled` | bool | `true` | When true, appends per-batch diagnostics to `.spectra-debug.log` in the project root. Best-effort writes; never blocks generation. Set to `false` to silence. |
+| `debug_log_enabled` | bool | `true` | When true, appends per-batch diagnostics to `.spectra-debug.log` in the project root. Best-effort writes; never blocks analysis or generation. Set to `false` to silence. |
+
+**Why a separate analysis timeout?** Behavior analysis runs once before generation and tends to be a single big call (no tool-calling loop). With slow / reasoning models on multi-doc suites it often takes 3–7 minutes — well over the 2-minute default — and fails silently. The symptom is `behaviors_found: 0` with a `recommended` count that looks plausible but is actually a hardcoded fallback default. v1.42.0+ surfaces this clearly: when analysis fails, the result file's `status` is set to `"analysis_failed"` (not `"analyzed"`) and the `message` field explains the cause and remediation. The bundled `spectra-generate` SKILL recognizes the new status and stops the agent from confidently presenting fallback numbers as a real recommendation.
 
 #### `.spectra-debug.log` format
 
 When `debug_log_enabled` is true, every batch writes one or more timestamped
 lines to `.spectra-debug.log`. Example session:
 
 ```text
-2026-04-11T01:30:14.221Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
-2026-04-11T01:34:51.778Z [generate] BATCH OK   requested=8 elapsed=277.6s
-2026-04-11T01:34:53.014Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
-2026-04-11T01:39:10.004Z [generate] BATCH OK   requested=8 elapsed=257.0s
-2026-04-11T01:39:11.221Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
-2026-04-11T01:59:11.330Z [generate] BATCH TIMEOUT requested=8 model=DeepSeek-V3.2 configured_timeout=20min
+2026-04-11T01:30:01.045Z [analyze]  ANALYSIS START documents=12 model=DeepSeek-V3.2 provider=azure-openai timeout=10min
+2026-04-11T01:33:47.219Z [analyze]  ANALYSIS OK behaviors=75 response_chars=18402 elapsed=226.2s
+2026-04-11T01:33:48.221Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
+2026-04-11T01:38:25.778Z [generate] BATCH OK   requested=8 elapsed=277.6s
+2026-04-11T01:38:27.014Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
+2026-04-11T01:42:44.004Z [generate] BATCH OK   requested=8 elapsed=257.0s
+2026-04-11T01:42:45.221Z [generate] BATCH START requested=8 model=DeepSeek-V3.2 provider=azure-openai timeout=20min
+2026-04-11T02:02:45.330Z [generate] BATCH TIMEOUT requested=8 model=DeepSeek-V3.2 configured_timeout=20min
 ```
 
+The `[analyze]` lines come from `BehaviorAnalyzer`; the `[generate]` lines come
+from `CopilotGenerationAgent`. `ANALYSIS OK` and `BATCH OK` record success and
+elapsed wall time. `ANALYSIS TIMEOUT`, `ANALYSIS PARSE_FAIL`, `ANALYSIS EMPTY`,
+`ANALYSIS ERROR`, and `BATCH TIMEOUT` record failures with the configured
+budget for cross-reference.
+
 Lines starting with `BATCH START` mark the beginning of an AI call (model,
 provider, batch size, configured timeout). `BATCH OK` marks success and
 records the elapsed wall time. `BATCH TIMEOUT` marks a timeout failure with
@@ -155,16 +166,19 @@ for the current batch. Useful when the AI returns a malformed JSON array.
     "providers": [
       { "name": "azure-openai", "model": "DeepSeek-V3.2", "enabled": true }
     ],
+    "analysis_timeout_minutes": 10,
     "generation_timeout_minutes": 20,
     "generation_batch_size": 8,
     "debug_log_enabled": true
   }
 }
 ```
 
-With this config, `spectra ai generate --count 100` splits into 13 batches
-of 8 tests each, every batch has a 20-minute budget, and each batch's actual
-elapsed time is logged so you can re-tune later.
+With this config, behavior analysis gets a 10-minute budget (DeepSeek typically
+finishes a multi-doc scan in 3–7 minutes), and `spectra ai generate --count 100`
+splits into 13 batches of 8 tests each, every batch with a 20-minute budget.
+Every analyze and generate call's actual elapsed time is logged so you can
+re-tune later.
 
 #### Example: silencing the debug log on a fast model