You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**`ai.critic.max_concurrent` config option** (default `1`, max `20`) — runs the critic verification phase in parallel using a `SemaphoreSlim` throttle. With `max_concurrent: 5` on a 200-test suite, the critic phase typically completes in ~1/5 of sequential time. Output is unchanged: results are written into a pre-sized array indexed by original position, so test files, indexes, and verdicts come back in the same order regardless of completion order. Manual-verdict short-circuit is preserved. Default of `1` keeps the byte-identical sequential path.
13
+
-**`.spectra-errors.log` dedicated error log** — companion to `.spectra-debug.log`. Written only when `debug.enabled: true` AND at least one error occurs during the run (lazy file creation; clean runs leave it untouched). Captures full exception type, message, response body (truncated to 500 chars), `Retry-After` header, and stack trace. New `debug.error_log_file` config (default `.spectra-errors.log`) and matching `mode` (append/overwrite) following the debug log. Errors are captured at every catch site that talks to the AI runtime: `BehaviorAnalyzer`, `CopilotGenerationAgent`, `CopilotCritic`, `CriteriaExtractor` (via `AnalyzeHandler`), and `UpdateHandler`. Each captured error gets a `see=.spectra-errors.log` cross-reference suffix on the corresponding debug log line.
14
+
-**Rate limit + error counts in Run Summary** — new `Errors`, `Rate limits`, and `Critic concurrency` rows in the Spectre.Console panel. When `Rate limits > 0`, a hint `(consider reducing ai.critic.max_concurrent)` appears next to the count. The same counts suffix the `RUN TOTAL` debug log line as ` rate_limits=<n> errors=<n>` (always present, even on zero runs, for grep-friendly CI consumption).
15
+
-**`ErrorLogger` static class** at `src/Spectra.CLI/Infrastructure/ErrorLogger.cs` — mirrors `DebugLogger`'s static-class pattern. Thread-safe via single lock. File-write failures are caught, emit one stderr warning, then disable further writes for the run (graceful degradation; never aborts the run). Includes `IsRateLimit(Exception)` classifier (HTTP 429 / `HttpStatusCode.TooManyRequests` / message-substring fallbacks for `429`, `rate limit`, `too many requests`, `rate_limit_exceeded`).
16
+
-**`RunErrorTracker` instance class** at `src/Spectra.CLI/Services/RunErrorTracker.cs` — per-run counters with `Errors` + `RateLimits` properties; thread-safe via `Interlocked`. Lives on `GenerateHandler` and `UpdateHandler` and is forwarded into `BehaviorAnalyzer`, `CopilotGenerationAgent`, `CriticFactory.TryCreate`, and `CopilotCritic` constructor.
17
+
-**+14 new tests** — `CriticConfigClampTests` (clamping ≤0→1, >20→20, in-range pass-through, JSON roundtrip), `ErrorLoggerTests` (file lazy-creation, disabled no-op, stack trace + response body capture, response truncation at 500, retry-after, multi-error append, concurrent-write thread safety, `IsRateLimit` classifier), `RunErrorTrackerTests` (counter atomicity under 1000-task contention), and 2 new `RunSummaryDebugFormatterTests` cases for the error/rate-limit suffixes.
18
+
19
+
### Changed
20
+
-`RunSummaryDebugFormatter.FormatRunTotal` gained an optional `RunErrorTracker` parameter; line now always ends with ` rate_limits=<n> errors=<n>`.
21
+
-`RunSummaryPresenter.Render` gained optional `errorTracker` and `criticConcurrency` parameters that drive the new rows.
22
+
-`CopilotCritic.VerifyTestAsync` distinguishes rate-limit failures from generic exceptions in the debug log (`CRITIC RATE_LIMITED` vs. `CRITIC ERROR`).
23
+
-`CriticFactory.TryCreate` and `TryCreateAsync` accept an optional `RunErrorTracker` parameter and forward it to `CopilotCritic`.
24
+
-`AgentFactory.CreateAgentAsync` accepts an optional `RunErrorTracker` parameter and forwards it to `CopilotGenerationAgent`.
25
+
26
+
### Notes
27
+
-`max_concurrent > 10` emits a one-line stderr warning at run start about provider rate-limit risk. `max_concurrent > 20` emits a "clamped to 20" warning.
28
+
- The error log uses lazy file creation: on a clean run, no `.spectra-errors.log` file is touched, so its mere existence indicates "this run had at least one failure".
29
+
- All existing tests still pass (513 Core / 351 MCP / 827 CLI). Adds 14 new tests.
Copy file name to clipboardExpand all lines: CLAUDE.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -209,6 +209,7 @@ spectra config list-automation-dirs # List dirs with existence s
209
209
-**Tests:** xUnit with structured results (never throw on validation errors)
210
210
211
211
## Recent Changes
212
+
- v1.48.0 (Spec 043): parallel critic verification + dedicated error log. New `ai.critic.max_concurrent` (default 1, max 20) parallelizes the critic loop in `GenerateHandler.VerifyTestsAsync` via `SemaphoreSlim` + `Task.WhenAll` with results collected into a pre-sized array indexed by original position (sequential path is preserved when `max_concurrent <= 1`, byte-identical to pre-spec behavior). `max_concurrent: 5` cuts critic phase to ~1/5 of sequential time on a 200-test suite. New `.spectra-errors.log` file (path via `debug.error_log_file`, default `.spectra-errors.log`) captures full exception type/message/response body (truncated 500 chars)/`Retry-After`/stack trace at every catch site (analyze/generate/critic/criteria/update); created lazily on first error so clean runs leave no file. New `ErrorLogger` static class at `src/Spectra.CLI/Infrastructure/ErrorLogger.cs` (mirrors `DebugLogger`, thread-safe via single lock, graceful stderr-warning fallback on file failure with `IsRateLimit` classifier for HTTP 429 / message-substring fallbacks). New `RunErrorTracker` instance class at `src/Spectra.CLI/Services/RunErrorTracker.cs` (thread-safe `Errors` + `RateLimits` counters via `Interlocked`). Run Summary panel gains `Critic concurrency`, `Errors`, `Rate limits` rows (with `(consider reducing ai.critic.max_concurrent)` hint when rate-limited). `RUN TOTAL` debug log line always ends with ` rate_limits=<n> errors=<n>`. `CopilotCritic.VerifyTestAsync` classifies rate limits separately (`CRITIC RATE_LIMITED` vs. `CRITIC ERROR`). `CriticFactory.TryCreate`/`AgentFactory.CreateAgentAsync`/`BehaviorAnalyzer`/`CopilotGenerationAgent` all gained an optional `RunErrorTracker` parameter. Values >10 emit a one-line stderr warning at run start about rate-limit risk; >20 clamp to 20 with a warning. +14 tests (`CriticConfigClampTests`, `ErrorLoggerTests`, `RunErrorTrackerTests`, expanded `RunSummaryDebugFormatterTests`).
212
213
- v1.46.1: removed the bogus 3-second TESTIMIZE UNHEALTHY grace window. The Copilot CLI's MCP client takes ~60s to attempt + time out an initialize handshake against a non-spec-compliant server, so the 3s wait always fired a false UNHEALTHY before the real SessionMcpServersLoadedEvent came through. Fix: trust the SDK event entirely — when it fires (success or failure), log the real status with the real error message. Companion fix in `Testimize.MCP.Server` (separate repo): the server was emitting JSON-RPC responses with both `"result":{...}` AND `"error":null`, which is a JSON-RPC 2.0 spec violation — the Copilot CLI's MCP client correctly rejected these as malformed and timed out with `MCP error -32001: Request timed out`. Updated `Testimize.MCP.Server/Program.cs` to serialize only the applicable field per spec. Requires rebuild + reinstall of testimize-mcp.
213
214
- v1.46.0: Testimize integration migrated to the Copilot SDK's native `SessionConfig.McpServers` API. The SDK owns spawn, initialize handshake, framing, tool discovery, and lifecycle — deleted the custom `TestimizeMcpClient` (which had a broken Content-Length framing assumption that deadlocked against testimize-mcp's newline-delimited protocol) and `TestimizeTools.CreateGenerateTestDataTool` wrapper. New `McpConfigBuilder.BuildTestimizeServer` helper translates `TestimizeConfig` → `McpLocalServerConfig`. Generation agent subscribes to `SessionMcpServersLoadedEvent` / `SessionMcpServerStatusChangedEvent` and logs `TESTIMIZE CONFIGURED / LOADED / STATUS_CHANGED / DISPOSED` lines with the SDK's status enum (`Connected / Failed / NeedsAuth / Pending / Disabled / NotConfigured`). 3-second defensive grace window before falling back. `{{testimize_strategy}}` placeholder added to `test-generation.md` so the AI knows to prefer `testimize/generate_hybrid_test_cases` (default) or `testimize/generate_pairwise_test_cases`. `AnalyzeFieldSpec` moved to new `FieldSpecAnalysisTools.cs` (no behavior change). User-facing `testimize.*` config schema is unchanged.
214
215
- v1.45.2: run header block at the start of every `.spectra-debug.log` run (`=== SPECTRA v<version> | <command line> | <UTC timestamp> ===` preceded by a 62-char horizontal separator). New `debug.mode` config (`append` default / `overwrite`) controls whether the file is truncated or prepended at run start. New `DebugLogger.BeginRun()` wired from `GenerateHandler` and `UpdateHandler`. Version pulled from `AssemblyInformationalVersionAttribute`, command line from `Environment.GetCommandLineArgs()`. **Deleted**`.spectra-debug-analysis.txt` and `.spectra-debug-response.txt` writers — all debug output now lives in `.spectra-debug.log` only. `SaveDebugResponse` method removed from `GenerationAgent`. Testimize lifecycle lines survive the overwrite-mode truncate (regression-tested).
**Speeding up the critic phase** (Spec 043, v1.48.0+): on a large generate run, the critic verifies tests one at a time and is typically the dominant cost (~6s/call × N tests). Set `ai.critic.max_concurrent: 5` (or higher, max 20) in `spectra.config.json` to run multiple critic calls in parallel — this typically cuts critic phase time to ~1/N of sequential without changing any output. The Run Summary panel shows the active `Critic concurrency` along with `Errors` and `Rate limits` counts. If you see `Rate limits > 0`, drop `max_concurrent` and re-run.
209
+
210
+
**Troubleshooting failed runs** (Spec 043, v1.48.0+): when `debug.enabled: true`, every failed AI call (timeout, HTTP 429, parse error, MCP failure, generic exception) writes a full entry to `.spectra-errors.log` — exception type, message, response body (truncated to 500 chars), `Retry-After` header, and stack trace. The file is created lazily on first error and never touched on healthy runs, so a single `cat .spectra-errors.log` answers "did anything go wrong?". Each error gets a `see=.spectra-errors.log` cross-reference in `.spectra-debug.log` so you can jump from the timeline to the full context.
211
+
208
212
**Tuning for slow models** (v1.41.0+): generation runs in batches of `ai.generation_batch_size` tests (default 30), each subject to `ai.generation_timeout_minutes` (default 5). Slower / reasoning-class models (DeepSeek-V3, large Azure deployments, GPT-4 Turbo with long contexts) typically need `generation_timeout_minutes: 15–20` and `generation_batch_size: 6–10`. When `ai.debug_log_enabled` is true (the default), each batch writes a timestamped `BATCH START` / `BATCH OK` / `BATCH TIMEOUT` line to `.spectra-debug.log` in the project root with the model, provider, requested count, and elapsed seconds — inspect this file to dial the knobs precisely. See [Configuration → ai.generation_timeout_minutes](configuration.md) for the full reference, example configs, and the timeout error message format.
Copy file name to clipboardExpand all lines: docs/configuration.md
+46Lines changed: 46 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -248,6 +248,25 @@ Grounding verification configuration. See [Grounding Verification](grounding-ver
248
248
|`api_key_env`| string | — | Environment variable for critic API key |
249
249
|`base_url`| string | — | Custom API endpoint |
250
250
|`timeout_seconds`| int |`30`| Critic request timeout |
251
+
|`max_concurrent`| int |`1`|**Spec 043:** number of concurrent critic verification calls. `1` (default) preserves the original sequential behavior. Higher values run multiple critic calls in parallel via a `SemaphoreSlim` throttle and dramatically reduce critic phase time on large suites. Clamped to `[1, 20]`. Values >10 emit a rate-limit-risk warning at run start. |
252
+
253
+
#### Critic concurrency tuning (Spec 043)
254
+
255
+
The critic phase is typically the dominant cost on large generate runs (one
256
+
sequential API call per test). Setting `max_concurrent` higher parallelizes
257
+
those calls without changing any output. Test files, verdicts, and indexes
258
+
are written in the original input order regardless of completion order.
If you start hitting provider rate limits, the Run Summary panel surfaces a
268
+
`Rate limits` count with a hint pointing back at this knob. Drop the value
269
+
and re-run.
251
270
252
271
#### Example: Azure-only billing (generator + critic on the same account)
253
272
@@ -456,8 +475,35 @@ logging, eliminating stale-file accumulation in CI environments.
456
475
|----------|------|---------|-------------|
457
476
|`enabled`| bool |`false`| When true, every AI call (analyze, generate, critic, criteria extraction) writes a timestamped line to the debug log. Non-AI lifecycle lines (testimize) are also written. Best-effort; never blocks the calling code. |
458
477
|`log_file`| string |`".spectra-debug.log"`| Path to the log file. Relative paths resolve against the repo root. |
478
+
|`error_log_file`| string |`".spectra-errors.log"`|**Spec 043:** path to the dedicated error log. Written only when `enabled` is `true` AND at least one error occurs during the run. On a clean run the file is not created or modified. Captures full exception type, message, response body (truncated to 500 chars), `Retry-After` header, and stack trace per failure. Follows the same `mode` semantics as `log_file`. |
459
479
|`mode`| string |`"append"`| Controls how the file is opened at run start. `"append"` prepends a separator + header block before each new run (useful for comparing multiple runs). `"overwrite"` truncates the file and writes just the header (keeps only the latest run). Any other value falls back to `"append"`. |
460
480
481
+
### Error log (Spec 043)
482
+
483
+
The error log is the companion to the debug log. Where the debug log is
484
+
high-volume (one line per AI call, hundreds per run), the error log is
485
+
zero-volume on a healthy run and only fills up when something actually
486
+
fails. A single `cat .spectra-errors.log` answers "did anything go wrong?"
Copy file name to clipboardExpand all lines: docs/grounding-verification.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,12 +90,15 @@ Configure the critic in `spectra.config.json`:
90
90
"enabled": true,
91
91
"provider": "github-models",
92
92
"model": "gpt-5-mini",
93
-
"timeout_seconds": 120
93
+
"timeout_seconds": 120,
94
+
"max_concurrent": 5
94
95
}
95
96
}
96
97
}
97
98
```
98
99
100
+
> **Spec 043 — parallel verification:** `max_concurrent` (default `1`) controls how many critic verification calls run concurrently. Setting it to `5` typically cuts the critic phase to ~1/5 of sequential time on a large suite without changing any output (results are written in original input order). Clamped to `[1, 20]`. Values >10 emit a rate-limit-risk warning at run start. If you start hitting rate limits, the Run Summary panel surfaces a `Rate limits` count with a hint pointing back at this knob.
101
+
99
102
> **Spec 041:** `gpt-5-mini` is the new default critic model (was `gpt-4o-mini`). It's included free on any paid Copilot plan and, when paired with a `gpt-4.1` generator, provides cross-architecture verification without burning premium requests. For Claude generators, the default critic rotates to `claude-haiku-4-5`. Per-provider defaults resolved by `CriticConfig.GetEffectiveModel()`: `github-models` / `openai` / `azure-openai` → `gpt-5-mini`; `anthropic` / `azure-anthropic` → `claude-haiku-4-5`.
100
103
101
104
Supported critic providers (spec 039 — same set as the generator):
**Purpose**: Validate specification completeness and quality before proceeding to planning
4
+
**Created**: 2026-04-11
5
+
**Feature**: [spec.md](../spec.md)
6
+
7
+
## Content Quality
8
+
9
+
-[x] No implementation details (languages, frameworks, APIs)
10
+
-[x] Focused on user value and business needs
11
+
-[x] Written for non-technical stakeholders
12
+
-[x] All mandatory sections completed
13
+
14
+
## Requirement Completeness
15
+
16
+
-[x] No [NEEDS CLARIFICATION] markers remain
17
+
-[x] Requirements are testable and unambiguous
18
+
-[x] Success criteria are measurable
19
+
-[x] Success criteria are technology-agnostic (no implementation details)
20
+
-[x] All acceptance scenarios are defined
21
+
-[x] Edge cases are identified
22
+
-[x] Scope is clearly bounded
23
+
-[x] Dependencies and assumptions identified
24
+
25
+
## Feature Readiness
26
+
27
+
-[x] All functional requirements have clear acceptance criteria
28
+
-[x] User scenarios cover primary flows
29
+
-[x] Feature meets measurable outcomes defined in Success Criteria
30
+
-[x] No implementation details leak into specification
31
+
32
+
## Notes
33
+
34
+
- The spec intentionally references concrete config keys (`ai.critic.max_concurrent`, `debug.error_log_file`) and class names (`ErrorLogger`, `RunErrorTracker`) in the Key Entities section because they form a stable user-facing contract — config keys ship to users and are part of the requirement, not implementation detail.
0 commit comments