Skip to content

Commit a088a03

Browse files
fix(v1.46.1): remove bogus 3s TESTIMIZE UNHEALTHY grace window
The 3-second grace in GenerationAgent was based on a wrong assumption about how fast the Copilot CLI's MCP client handshakes with a server. In reality it takes ~60s to attempt + time out an initialize against a non-spec-compliant server (JSON-RPC error -32001). Our grace fired well before the real SessionMcpServersLoadedEvent came through, producing a false UNHEALTHY line. Then the real event arrived ~60s later with status=Failed and the actual error message. Fix: trust the SDK event entirely. When it fires (success or failure), log the real status with the real error. Session creation is non-blocking with respect to MCP loading, so generation starts immediately and the MCP handshake runs in parallel. ## Separate fix (Testimize.MCP.Server, not this repo) The underlying reason the CLI's -32001 timeout fires at all: testimize was emitting responses with both "result":{...} AND "error":null, which is a JSON-RPC 2.0 spec violation. Strict MCP clients reject malformed responses and wait forever for a valid one. Updated testimize's Program.cs to serialize only the applicable field per spec (requires rebuild + reinstall of testimize-mcp 1.1.10). ## Expected debug log after both fixes [generate] TESTIMIZE CONFIGURED server=testimize command=testimize-mcp ... [generate] TESTIMIZE LOADED server=testimize status=Connected source=... [generate] TESTIMIZE DISPOSED (session closed) No more false UNHEALTHY line. LOADED fires with Connected instead of Failed once testimize-mcp's JSON-RPC output is spec-compliant. Tests: 1647/1647.
1 parent 8c6226c commit a088a03

File tree

3 files changed

+21
-27
lines changed

3 files changed

+21
-27
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,16 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.46.1] - 2026-04-11
9+
10+
### Fixed
11+
- **False `TESTIMIZE UNHEALTHY no_load_event_within_3s` line in debug log.** The 3-second grace window in `GenerationAgent` was based on a wrong assumption about how fast the Copilot CLI's MCP client handshakes with a server — in reality it takes ~60s to attempt + time out an initialize against a non-spec-compliant server. Removed the grace window entirely. The SDK's `SessionMcpServersLoadedEvent` is the single source of truth; when it fires (success or failure), the real status is logged with the real error message. Generation proceeds immediately because the SDK does not block session creation on MCP loading.
12+
- **Session-companion fix in `Testimize.MCP.Server`** (separate repo): the server previously emitted JSON-RPC responses with both `"result"` and `"error":null`, which is a [JSON-RPC 2.0 spec violation](https://www.jsonrpc.org/specification#response_object) — the spec says a response MUST contain `result` OR `error`, not both. The Copilot CLI's MCP client (built on `@modelcontextprotocol/sdk`) correctly rejects malformed responses and times out at ~60s with `MCP error -32001: Request timed out`. Fix: serialize only the field that applies. See `Testimize.MCP.Server/Program.cs` change; requires rebuilding testimize-mcp 1.1.10 from source and reinstalling.
13+
14+
### Notes
15+
- **Zero code-path changes beyond the 3s grace removal** — token tracking, debug log formatting, run summary, `RUN TOTAL` line, and the native `SessionConfig.McpServers` wiring from v1.46.0 are all untouched.
16+
- The SDK's behavior (blocking vs non-blocking session creation on MCP load) was empirically verified by observing the timing in `.spectra-debug.log`: session creation completed immediately, generation ran in parallel with the MCP handshake, and the load event arrived ~68s later with `status=Failed`.
17+
818
## [1.46.0] - 2026-04-11
919

1020
### Added

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ spectra config list-automation-dirs # List dirs with existence s
209209
- **Tests:** xUnit with structured results (never throw on validation errors)
210210

211211
## Recent Changes
212+
- v1.46.1: removed the bogus 3-second TESTIMIZE UNHEALTHY grace window. The Copilot CLI's MCP client takes ~60s to attempt + time out an initialize handshake against a non-spec-compliant server, so the 3s wait always fired a false UNHEALTHY before the real SessionMcpServersLoadedEvent came through. Fix: trust the SDK event entirely — when it fires (success or failure), log the real status with the real error message. Companion fix in `Testimize.MCP.Server` (separate repo): the server was emitting JSON-RPC responses with both `"result":{...}` AND `"error":null`, which is a JSON-RPC 2.0 spec violation — the Copilot CLI's MCP client correctly rejected these as malformed and timed out with `MCP error -32001: Request timed out`. Updated `Testimize.MCP.Server/Program.cs` to serialize only the applicable field per spec. Requires rebuild + reinstall of testimize-mcp.
212213
- v1.46.0: Testimize integration migrated to the Copilot SDK's native `SessionConfig.McpServers` API. The SDK owns spawn, initialize handshake, framing, tool discovery, and lifecycle — deleted the custom `TestimizeMcpClient` (which had a broken Content-Length framing assumption that deadlocked against testimize-mcp's newline-delimited protocol) and `TestimizeTools.CreateGenerateTestDataTool` wrapper. New `McpConfigBuilder.BuildTestimizeServer` helper translates `TestimizeConfig` → `McpLocalServerConfig`. Generation agent subscribes to `SessionMcpServersLoadedEvent` / `SessionMcpServerStatusChangedEvent` and logs `TESTIMIZE CONFIGURED / LOADED / STATUS_CHANGED / DISPOSED` lines with the SDK's status enum (`Connected / Failed / NeedsAuth / Pending / Disabled / NotConfigured`). 3-second defensive grace window before falling back. `{{testimize_strategy}}` placeholder added to `test-generation.md` so the AI knows to prefer `testimize/generate_hybrid_test_cases` (default) or `testimize/generate_pairwise_test_cases`. `AnalyzeFieldSpec` moved to new `FieldSpecAnalysisTools.cs` (no behavior change). User-facing `testimize.*` config schema is unchanged.
213214
- v1.45.2: run header block at the start of every `.spectra-debug.log` run (`=== SPECTRA v<version> | <command line> | <UTC timestamp> ===` preceded by a 62-char horizontal separator). New `debug.mode` config (`append` default / `overwrite`) controls whether the file is truncated or prepended at run start. New `DebugLogger.BeginRun()` wired from `GenerateHandler` and `UpdateHandler`. Version pulled from `AssemblyInformationalVersionAttribute`, command line from `Environment.GetCommandLineArgs()`. **Deleted** `.spectra-debug-analysis.txt` and `.spectra-debug-response.txt` writers — all debug output now lives in `.spectra-debug.log` only. `SaveDebugResponse` method removed from `GenerationAgent`. Testimize lifecycle lines survive the overwrite-mode truncate (regression-tested).
214215
- v1.45.1 (Spec 040 follow-up): real token counts at every AI call site via `AssistantUsageEvent` subscription through new `CopilotUsageObserver`. Falls back to `text.Length / 4` via new `TokenEstimator` when the usage event misses the 200ms grace window. Fallback values are flagged with `~` prefix in debug log (`~tokens_in=` / `~tokens_out=`) and `"estimated": true` on `token_usage` in JSON. New `RUN TOTAL` summary line emitted to `.spectra-debug.log` at the end of every generate/update run via `RunSummaryDebugFormatter` — format `RUN TOTAL command=… suite=… calls=… tokens_in=… tokens_out=… elapsed=… phases=analysis:1/18.5s,generation:3/52.3s,critic:20/1m34s`. Discovered during implementation: `AssistantUsageData.InputTokens`/`OutputTokens` are actually `double?` in the SDK assembly (XML doc said `int`) — cast via `(int)(v ?? 0)`.

src/Spectra.CLI/Agent/Copilot/GenerationAgent.cs

Lines changed: 10 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -141,10 +141,6 @@ public async Task<GenerationResult> GenerateTestsAsync(
141141

142142
// Track tool calls and schedule delayed composing message
143143
using var timerCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
144-
// v1.46.0: one-shot latch so we know whether testimize reported a
145-
// load status within the grace window. Used for the defensive
146-
// fallback below.
147-
var testimizeLoadReported = false;
148144
using var subscription = session.On(evt =>
149145
{
150146
if (evt is AssistantUsageEvent usage && usage.Data is { } usageData)
@@ -168,7 +164,6 @@ public async Task<GenerationResult> GenerateTestsAsync(
168164
DebugLog($"TESTIMIZE LOADED server={name} status={status} source={s.Source ?? "?"}{errorSuffix}");
169165
if (string.Equals(name, "testimize", StringComparison.OrdinalIgnoreCase))
170166
{
171-
testimizeLoadReported = true;
172167
if (status.Equals("Connected", StringComparison.OrdinalIgnoreCase))
173168
{
174169
_onStatus?.Invoke("Testimize connected — algorithmic test data available");
@@ -184,10 +179,6 @@ public async Task<GenerationResult> GenerateTestsAsync(
184179
if (evt is SessionMcpServerStatusChangedEvent changed && changed.Data is { } changedData)
185180
{
186181
DebugLog($"TESTIMIZE STATUS_CHANGED server={changedData.ServerName} status={changedData.Status}");
187-
if (string.Equals(changedData.ServerName, "testimize", StringComparison.OrdinalIgnoreCase))
188-
{
189-
testimizeLoadReported = true;
190-
}
191182
return;
192183
}
193184
if (evt is ToolExecutionStartEvent toolStart)
@@ -218,24 +209,16 @@ public async Task<GenerationResult> GenerateTestsAsync(
218209
}
219210
});
220211

221-
// v1.46.0: defensive 3-second grace window — if testimize was
222-
// configured but the SDK hasn't reported a load status by the
223-
// time we're ready to send the prompt, log an UNHEALTHY line and
224-
// continue. The SDK still owns the process; we're just warning
225-
// the user that the MCP server may not be responding.
226-
if (mcpServers is not null)
227-
{
228-
var graceDeadline = DateTime.UtcNow.AddSeconds(3);
229-
while (!testimizeLoadReported && DateTime.UtcNow < graceDeadline)
230-
{
231-
await Task.Delay(100, ct);
232-
}
233-
if (!testimizeLoadReported)
234-
{
235-
DebugLog("TESTIMIZE UNHEALTHY server=testimize reason=no_load_event_within_3s");
236-
_onStatus?.Invoke("Testimize server didn't report load status within 3s — AI may still succeed without it");
237-
}
238-
}
212+
// v1.46.1: the old 3-second grace window was deleted. The
213+
// Copilot CLI's MCP client takes ~60 seconds to attempt +
214+
// time out the initialize handshake (JSON-RPC error -32001),
215+
// so a 3-second wait would always fire a false UNHEALTHY
216+
// line and then the real LOADED event would arrive much
217+
// later. Instead, we let the SDK fire SessionMcpServersLoadedEvent
218+
// whenever it's ready — the event handler above logs the
219+
// real status (Connected / Failed / NeedsAuth / ...) with
220+
// the actual error message. Generation proceeds immediately
221+
// because the SDK doesn't block session creation on MCP load.
239222

240223
// Build the combined prompt with system instructions and user request.
241224
// The profile format (JSON schema sent to the AI) is resolved from

0 commit comments

Comments
 (0)