Skip to content

Latest commit

 

History

History
318 lines (240 loc) · 41.3 KB

File metadata and controls

318 lines (240 loc) · 41.3 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.48.3] - 2026-04-12

Changed — Testimize migrated from MCP child process to in-process NuGet

  • Direct Testimize 1.1.10 NuGet reference in Spectra.CLI.csproj. The library runs in the same process as the CLI — no child process spawn, no JSON-RPC, no MCP handshake.
  • New TestimizeRunner at src/Spectra.CLI/Agent/Testimize/TestimizeRunner.cs orchestrates the in-process call. Maps FieldSpec values from behavior analysis onto TestimizeInputBuilder.Add* calls, maps config.Testimize.Strategy / Mode strings onto the library's enums, and calls TestimizeEngine.Configure(...).Generate(). Runs once per suite between behavior analysis and the batch generation loop (not once per batch).
  • Behavior-analysis schema extended with a top-level field_specs[] array. The AI emits structured field constraints (numeric ranges, string lengths, date ranges, enumerated values, required flags, expected error messages) in the same JSON response as behaviors. BehaviorAnalyzer.ParseFieldSpecs deserialises it. When the AI returns no field specs, TestimizeRunner falls back to the local FieldSpecAnalysisTools.Analyze regex extractor over the raw documentation.
  • Test-generation prompt template rewritten: the old {{#if testimize_enabled}} block that instructed the model to call testimize/generate_hybrid_test_cases / testimize/generate_pairwise_test_cases is replaced by {{#if testimize_dataset}}, which embeds a literal PRE-COMPUTED ALGORITHMIC TEST DATA (from Testimize, strategy=…) YAML block with the exact values, categories, and expected error messages the engine produced. The model no longer has to "choose" to use Testimize — the values are in the prompt as authoritative facts.
  • New shared types Spectra.Core.Models.Testimize.FieldSpec, TestimizeDataset, TestimizeRow, TestimizeCell — contract between behavior analysis, TestimizeRunner, and the prompt embedder. Spectra.Core has no dependency on Testimize itself, keeping the library surface isolated to Spectra.CLI.
  • testimizeSettings.json ships with the tool at src/Spectra.CLI/Templates/testimizeSettings.json. Copied to the tool's output directory so Testimize.FakerFactory.Initialize finds it at runtime (without it, the library throws NullReferenceException before generating anything).
  • TESTIMIZE OK strategy=… fields=… test_data_sets=… elapsed=…s log line in .spectra-debug.log, once per suite, when generation succeeds. Replaces TESTIMIZE CONFIGURED / LOADED / STATUS_CHANGED / DISPOSED / GATE mcp_loaded lines. New skip reasons: disabled, no_field_specs, insufficient_fields (Testimize requires ≥ 2 parameters). Also TESTIMIZE FALLBACK source=regex fields=<n> when the regex extractor recovers from an empty AI response, and TESTIMIZE ERROR exception=<Type> message="…" when the engine throws.

Removed

  • src/Spectra.CLI/Agent/Copilot/McpConfigBuilder.cs + tests — no longer needed, MCP server registration is gone.
  • src/Spectra.CLI/Agent/Testimize/TestimizeDetector.cs + tests — the dotnet tool list -g probe for the old global Testimize.MCP.Server tool is obsolete.
  • tests/Spectra.CLI.Tests/Agent/Copilot/TestimizeStrategyResolverTests.cs — tested CopilotGenerationAgent.ResolveTestimizeStrategyToolName, which mapped config strategy to MCP tool names. The helper is deleted.
  • MCP event subscription branches in CopilotGenerationAgent: SessionMcpServersLoadedEvent, SessionMcpServerStatusChangedEvent, mcpServersLoadedTcs TaskCompletionSource, and the 30-second GATE mcp_loaded block added in 1.48.2.
  • mcpServers parameter from CopilotService.CreateGenerationSessionAsync.
  • TestimizeMcpConfig nested class and Mcp property from TestimizeConfig. Old configs with testimize.mcp.command / args still parse cleanly — the JSON serializer ignores the unknown keys — but those values no longer affect anything.
  • testimize.mcp block from the default spectra.config.json template.
  • InstallCommand field from TestimizeCheckResult — there is no global tool to install anymore.

Rewritten

  • TestimizeCheckHandler now probes the Testimize assembly via reflection (typeof(Testimize.Usage.TestimizeEngine).Assembly) instead of shelling out to dotnet tool list -g. "Installed" means "assembly is loadable in the current process" — always true when Spectra.CLI is running, since it's a compile-time PackageReference. Healthy means Enabled && Installed.

Why

Across every log from 1.48.1 and 1.48.2 the Copilot SDK consistently fired TESTIMIZE LOADED ~700 ms after BATCH START, never before. The SDK loads MCP servers lazily — the child process doesn't spawn until the first SendAndWaitAsync — so the model's tool catalog is frozen before testimize's tools can reach it. Every attempt to gate the send on SessionMcpServersLoadedEvent timed out because the event cannot fire until the send it's trying to gate. The in-process path sidesteps the entire race: by the time the prompt is built, the test data already exists as strings that go straight into the prompt, and the model uses them verbatim.

Notes

  • All tests still pass: 514 Core / 351 MCP / 839 CLI (1704 total). New TestimizeRunnerTests covers strategy/mode enum mapping, empty-spec skip, insufficient-fields skip, regex fallback recovery, Integer+Email, Date+Integer, and SingleSelect+Integer happy paths.
  • Testimize prints its output-generator result to stdout and attempts to copy it to the clipboard on every run. TestimizeRunner temporarily redirects Console.Out during the Generate() call so neither corrupts --output-format json nor fails on headless CI.
  • Testimize's generators require ≥ 2 parameters (Pairwise testing requires at least two parameters.). Single-field suites skip cleanly with TESTIMIZE SKIP reason=insufficient_fields fields=1 ….

[1.48.0] - 2026-04-12

Added — Spec 043: Parallel Critic Verification & Error Log

  • ai.critic.max_concurrent config option (default 1, max 20) — runs the critic verification phase in parallel using a SemaphoreSlim throttle. With max_concurrent: 5 on a 200-test suite, the critic phase typically completes in ~1/5 of sequential time. Output is unchanged: results are written into a pre-sized array indexed by original position, so test files, indexes, and verdicts come back in the same order regardless of completion order. Manual-verdict short-circuit is preserved. Default of 1 keeps the byte-identical sequential path.
  • .spectra-errors.log dedicated error log — companion to .spectra-debug.log. Written only when debug.enabled: true AND at least one error occurs during the run (lazy file creation; clean runs leave it untouched). Captures full exception type, message, response body (truncated to 500 chars), Retry-After header, and stack trace. New debug.error_log_file config (default .spectra-errors.log) and matching mode (append/overwrite) following the debug log. Errors are captured at every catch site that talks to the AI runtime: BehaviorAnalyzer, CopilotGenerationAgent, CopilotCritic, CriteriaExtractor (via AnalyzeHandler), and UpdateHandler. Each captured error gets a see=.spectra-errors.log cross-reference suffix on the corresponding debug log line.
  • Rate limit + error counts in Run Summary — new Errors, Rate limits, and Critic concurrency rows in the Spectre.Console panel. When Rate limits > 0, a hint (consider reducing ai.critic.max_concurrent) appears next to the count. The same counts suffix the RUN TOTAL debug log line as rate_limits=<n> errors=<n> (always present, even on zero runs, for grep-friendly CI consumption).
  • ErrorLogger static class at src/Spectra.CLI/Infrastructure/ErrorLogger.cs — mirrors DebugLogger's static-class pattern. Thread-safe via single lock. File-write failures are caught, emit one stderr warning, then disable further writes for the run (graceful degradation; never aborts the run). Includes IsRateLimit(Exception) classifier (HTTP 429 / HttpStatusCode.TooManyRequests / message-substring fallbacks for 429, rate limit, too many requests, rate_limit_exceeded).
  • RunErrorTracker instance class at src/Spectra.CLI/Services/RunErrorTracker.cs — per-run counters with Errors + RateLimits properties; thread-safe via Interlocked. Lives on GenerateHandler and UpdateHandler and is forwarded into BehaviorAnalyzer, CopilotGenerationAgent, CriticFactory.TryCreate, and CopilotCritic constructor.
  • +14 new testsCriticConfigClampTests (clamping ≤0→1, >20→20, in-range pass-through, JSON roundtrip), ErrorLoggerTests (file lazy-creation, disabled no-op, stack trace + response body capture, response truncation at 500, retry-after, multi-error append, concurrent-write thread safety, IsRateLimit classifier), RunErrorTrackerTests (counter atomicity under 1000-task contention), and 2 new RunSummaryDebugFormatterTests cases for the error/rate-limit suffixes.

Changed

  • RunSummaryDebugFormatter.FormatRunTotal gained an optional RunErrorTracker parameter; line now always ends with rate_limits=<n> errors=<n>.
  • RunSummaryPresenter.Render gained optional errorTracker and criticConcurrency parameters that drive the new rows.
  • CopilotCritic.VerifyTestAsync distinguishes rate-limit failures from generic exceptions in the debug log (CRITIC RATE_LIMITED vs. CRITIC ERROR).
  • CriticFactory.TryCreate and TryCreateAsync accept an optional RunErrorTracker parameter and forward it to CopilotCritic.
  • AgentFactory.CreateAgentAsync accepts an optional RunErrorTracker parameter and forwards it to CopilotGenerationAgent.

Notes

  • max_concurrent > 10 emits a one-line stderr warning at run start about provider rate-limit risk. max_concurrent > 20 emits a "clamped to 20" warning.
  • The error log uses lazy file creation: on a clean run, no .spectra-errors.log file is touched, so its mere existence indicates "this run had at least one failure".
  • All existing tests still pass (513 Core / 351 MCP / 827 CLI). Adds 14 new tests.

[1.47.1] - 2026-04-11

Added

  • TOOL CALL lines in .spectra-debug.log for the generation flow. CopilotGenerationAgent now subscribes to ToolExecutionStartEvent + ToolExecutionCompleteEvent from the Copilot SDK and emits one line per tool invocation, with elapsed time computed by correlating start/complete via ToolCallId. Format: TOOL CALL tool=<name> mcp_server=<server-or--> elapsed=<sec>s success=<bool>. For MCP tools the SDK exposes McpServerName + McpToolName directly, so testimize calls show as tool=generate_hybrid_test_cases mcp_server=testimize elapsed=1.23s success=true — making it possible to verify (a) whether the AI is actually invoking testimize tools during generation, or just ignoring them, and (b) which built-in tools (ReadDocument, GetNextTestIds, BatchWriteTests, ...) take how long. Previously only the testimize lifecycle (CONFIGURED/LOADED/STATUS_CHANGED/DISPOSED) was logged, with no visibility into per-call usage.

Notes

  • Tool-call tracking is per-session, scoped to a single GenerateTestsAsync call. The correlation dictionary is ConcurrentDictionary<string, ...> since SDK events fire from internal tasks.
  • Only added to the generation agent (CopilotGenerationAgent). The other Copilot-SDK agents — BehaviorAnalyzer, CriteriaExtractor, GroundingAgent (critic) — don't currently emit TOOL CALL lines. They could be extended the same way if needed; the only flow where MCP tools (testimize) are wired in is generation, which is what this change targets.

[1.47.0] - 2026-04-11

Added

  • Live progress bars for spectra ai generate and spectra ai update (spec 041). Long runs (--count 40+) now show real progress feedback in two places:
    • Terminal: a Generating tests Spectre.Console progress bar advances per batch, followed by a Verifying tests bar that increments once per critic call (showing the most recent test ID and verdict). For update, an Updating tests bar advances per applied proposal.
    • Browser progress page (.spectra-progress.html): a new live progress section renders the same data driven by an in-flight progress object inside .spectra-result.json, picked up on the existing 1.5s auto-refresh cycle. Verification bar is dimmed during the generation phase, then activated.
  • progress field in .spectra-result.json — present only while a run is in flight (snapshot of phase, target, generated, verified, current batch, total batches, last test ID, last verdict). Removed automatically by ProgressManager.Complete() / Fail() so the final result file shows runSummary instead. Schema documented in specs/041-progress-bars/contracts/result-json-progress-schema.md.
  • ProgressReporter.ProgressTwoTaskAsync(...) — new wrapper that creates two sequential Spectre tasks inside a single live region. Handles passed to the action implement a small IProgressTaskHandle abstraction so command handlers don't take a direct dependency on Spectre internals.
  • ProgressSnapshot + ProgressPhase model at src/Spectra.CLI/Progress/ProgressSnapshot.cs.
  • +17 unit testsProgressBarTests, ProgressManagerProgressFieldTests, ProgressPageProgressBarTests covering suppression rules, progress field clear-on-complete/fail, and HTML rendering for generation/verifying/updating phases.

Changed

  • ProgressReporter suppression rules — now suppresses progress bars not just under --output-format json, but also under --verbosity quiet and when stdout is non-interactive (redirected, piped, or no TTY). SKILL/CI invocations remain ANSI-free.
  • ProgressReporter.StatusAsync — short-circuits when the reporter is inside an active ProgressTwoTaskAsync block, so existing inline _progress.StatusAsync(...) spinners inside the batch loop don't conflict with the outer progress bars.
  • UpdateHandler.ApplyChangesAsync — gained an optional onProposalApplied callback parameter for per-proposal progress reporting. Existing call sites unaffected.

Notes

  • ETA / time-remaining display, per-test progress within a single generation batch, and formal cancel-with-cleanup on Ctrl+C are explicitly out of scope. See specs/041-progress-bars/spec.md.
  • --output-format json and --verbosity quiet behavior is unchanged: SKILLs and CI scripts see no progress-bar output on stdout.

[1.46.1] - 2026-04-11

Fixed

  • False TESTIMIZE UNHEALTHY no_load_event_within_3s line in debug log. The 3-second grace window in GenerationAgent was based on a wrong assumption about how fast the Copilot CLI's MCP client handshakes with a server — in reality it takes ~60s to attempt + time out an initialize against a non-spec-compliant server. Removed the grace window entirely. The SDK's SessionMcpServersLoadedEvent is the single source of truth; when it fires (success or failure), the real status is logged with the real error message. Generation proceeds immediately because the SDK does not block session creation on MCP loading.
  • Session-companion fix in Testimize.MCP.Server (separate repo): the server previously emitted JSON-RPC responses with both "result" and "error":null, which is a JSON-RPC 2.0 spec violation — the spec says a response MUST contain result OR error, not both. The Copilot CLI's MCP client (built on @modelcontextprotocol/sdk) correctly rejects malformed responses and times out at ~60s with MCP error -32001: Request timed out. Fix: serialize only the field that applies. See Testimize.MCP.Server/Program.cs change; requires rebuilding testimize-mcp 1.1.10 from source and reinstalling.

Notes

  • Zero code-path changes beyond the 3s grace removal — token tracking, debug log formatting, run summary, RUN TOTAL line, and the native SessionConfig.McpServers wiring from v1.46.0 are all untouched.
  • The SDK's behavior (blocking vs non-blocking session creation on MCP load) was empirically verified by observing the timing in .spectra-debug.log: session creation completed immediately, generation ran in parallel with the MCP handshake, and the load event arrived ~68s later with status=Failed.

[1.46.0] - 2026-04-11

Added

  • Native MCP integration for Testimizespectra ai generate now passes Testimize to the Copilot SDK via its native SessionConfig.McpServers API. The SDK owns the full lifecycle: process spawn, initialize handshake, framing variant (Content-Length OR newline-delimited), tool discovery, cost attribution (AssistantUsageData.Initiator = "mcp-sampling"), permission prompts, and OAuth. No more custom JSON-RPC protocol client.
  • McpConfigBuilder.BuildTestimizeServer(TestimizeConfig) — small static helper that translates the user-facing testimize config block into an McpLocalServerConfig for the SDK. Unit-tested in isolation.
  • CopilotService.CreateGenerationSessionAsync(..., mcpServers) — new optional parameter on the session factory that forwards an MCP server map into SessionConfig.McpServers. Existing callers unaffected.
  • SDK-driven testimize lifecycle in .spectra-debug.log — the agent subscribes to SessionMcpServersLoadedEvent and SessionMcpServerStatusChangedEvent and writes TESTIMIZE CONFIGURED / LOADED / STATUS_CHANGED / DISPOSED lines driven by the SDK's own status enum (Connected / Failed / NeedsAuth / Pending / Disabled / NotConfigured). No more 5-second probe hangs.
  • testimize_strategy prompt placeholder — the test-generation.md template now embeds the preferred testimize tool name (derived from testimize.strategy) so the AI knows whether to prefer testimize/generate_hybrid_test_cases (HybridArtificialBeeColony, default) or testimize/generate_pairwise_test_cases (fast Pairwise).
  • Defensive 3-second grace window — if testimize is configured but the SDK hasn't reported a load status by the time the handler is ready to send the prompt, the agent logs TESTIMIZE UNHEALTHY reason=no_load_event_within_3s and continues. The AI will still succeed, just without testimize's optimized test data.
  • +17 testsMcpConfigBuilderTests (7 cases) and TestimizeStrategyResolverTests (4 theory cases + 1 fact + 1 edge), plus updates to TestimizeConditionalBlockTests and AnalyzeFieldSpecTests to reflect the new tool names.

Fixed

  • Testimize health probe 5-second hang. Root cause: our custom TestimizeMcpClient used Content-Length: N\r\n\r\n<body> MCP-spec-v1 framing, but Testimize.MCP.Server uses newline-delimited JSON (ReadLineAsync/WriteLineAsync). Worse, Spectra wrote the JSON body with no trailing \n, so testimize's ReadLineAsync blocked forever waiting for one — while Spectra's own ReadLineAsync blocked waiting for a response. Exactly 5 seconds later, the probe timeout fired. Deleting the client and delegating to the SDK's native MCP implementation fixes this class of bug permanently (the SDK handles both framing variants correctly).

Removed

  • src/Spectra.CLI/Agent/Testimize/TestimizeMcpClient.cs — the 240-line custom JSON-RPC-over-stdio client with the Content-Length framing bug. Gone.
  • src/Spectra.CLI/Agent/Testimize/TestimizeTools.cs — the CreateGenerateTestDataTool AIFunction wrapper that routed to the broken client. Gone. The AI now calls the real testimize MCP tools directly.
  • Tests for the deleted classesTestimizeMcpClientTests, TestimizeMcpClientGracefulTests, TestimizeToolsTests. Also deleted one obsolete test from TestimizeCheckHandlerTests (Check_EnabledButBogusCommand_...) that asserted old-client behavior.
  • spectra testimize check's "healthy" field now just mirrors installed — the real runtime health check moved to spectra ai generate where the debug log's TESTIMIZE LOADED server=testimize status=Connected line is the authoritative signal.

Changed

  • FieldSpecAnalysisTools.cs (new file, replaces the surviving half of TestimizeTools.cs) — contains CreateAnalyzeFieldSpecTool + the local regex-based ExtractFields helper. No behavior change; same signatures, new class name scoped to what the code actually does.
  • test-generation.md prompt template — the {{#if testimize_enabled}} block now names both real testimize MCP tools explicitly, references the new {{testimize_strategy}} placeholder, and instructs the AI to call AnalyzeFieldSpec first for prose-described fields.
  • behavior-analysis.md prompt template — same update: references testimize/generate_hybrid_test_cases / testimize/generate_pairwise_test_cases instead of the deleted GenerateTestData wrapper name.

Notes

  • The user-facing config schema (testimize.enabled, testimize.mcp.command, testimize.mcp.args, testimize.strategy, testimize.mode) is unchanged. Existing spectra.config.json files continue to work with no edits.
  • TestimizeDetector.IsInstalledAsync() is unchanged — still checks dotnet tool list -g for testimize.mcp.server.

[1.45.2] - 2026-04-11

Added

  • Run header in .spectra-debug.log — every generate / update run now writes a header block at the start containing the SPECTRA version, the full user command line, and an ISO-8601 UTC timestamp:
    ──────────────────────────────────────────────────────────────
    === SPECTRA v1.45.2 | spectra ai generate checkout --count 20 | 2026-04-11T14:30:00Z ===
    
    Makes each run self-identifying when grepping or comparing logs across days.
  • debug.mode config option with two values:
    • "append" (default): keeps existing log content and prepends a horizontal separator + header before each new run. Good for comparing multiple runs post-hoc.
    • "overwrite": truncates .spectra-debug.log at run start so only the latest run is preserved. Good for focused troubleshooting. Unknown values fall back to "append". Wired from GenerateHandler and UpdateHandler at run start via new DebugLogger.BeginRun().
  • New DebugLogger.BeginRun() static method that materializes the header and honors the mode. Best-effort — never throws, no-op when debug logging is disabled.
  • Version is read via AssemblyInformationalVersionAttribute so the header reflects the actual installed build.
  • Command line is reconstructed from Environment.GetCommandLineArgs() (skipping the dotnet tool shim path) so it looks like what the user typed.
  • +11 tests: BeginRun_Disabled_NoFileWritten, BeginRun_Overwrite_TruncatesAndWritesHeader, BeginRun_Append_ExistingFile_PrependsSeparatorAndHeader, BeginRun_Append_MissingFile_CreatesWithHeader, BeginRun_UnknownMode_TreatedAsAppend, BeginRun_HeaderIncludesIsoTimestamp, BeginRun_ThenAppendAi_LogContainsHeaderAndAiLine, BeginRun_ThenAppendTestimize_LinesSurvive (regression test for the mode=overwrite truncate), plus DebugConfigTests.Default_Mode_IsAppend and two deserialization round-trip tests.

Removed

  • .spectra-debug-analysis.txt — no longer written by BehaviorAnalyzer on ANALYSIS PARSE_FAIL. The parse-fail reason is still logged to .spectra-debug.log.
  • .spectra-debug-response.txt — no longer written by GenerationAgent after each batch. The SaveDebugResponse method and its call site have been deleted. The stale "Check .spectra-debug-response.txt for the raw AI output" error message now points at .spectra-debug.log.
  • All debug output now flows through a single file: .spectra-debug.log.

Verified

  • Testimize lifecycle lines (TESTIMIZE DISABLED / START / NOT_INSTALLED / UNHEALTHY / HEALTHY / DISPOSED) are written via DebugLogger.Append("generate", ...) after BeginRun, so they survive the mode=overwrite truncate. Covered by BeginRun_ThenAppendTestimize_LinesSurvive.

[1.45.1] - 2026-04-11

Added

  • Real token counts from AssistantUsageEvent (spec 040 follow-up) — the Copilot SDK surfaces provider-reported InputTokens / OutputTokens on a separate AssistantUsageEvent (not on AssistantMessageEvent as Spectra previously assumed). Every AI call site (BehaviorAnalyzer, GenerationAgent batch loop, GroundingAgent critic, CriteriaExtractor — both methods) now subscribes to this event via a new CopilotUsageObserver helper, waits a 200ms grace after the request for ordering-safe capture, and records the real counts into TokenUsageTracker. No more ? placeholders in the normal case.
  • text.Length / 4 fallback when the usage event fails to arrive within the grace window — captured by the new TokenEstimator service. Fallback values are flagged end-to-end: ~tokens_in=… / ~tokens_out=… in the debug log and "estimated": true on both the token_usage report and each affected phase DTO in JSON.
  • RUN TOTAL summary line written to .spectra-debug.log at the end of every generate / update run, via the new RunSummaryDebugFormatter. Format: RUN TOTAL command=generate suite=checkout calls=24 tokens_in=64480 tokens_out=24240 elapsed=2m45s phases=analysis:1/18.5s,generation:3/52.3s,critic:20/1m34s. Makes the log self-summarizing — one grep and you have the run-level numbers. When any phase used the estimate fallback, the totals gain ~ prefixes.
  • New services: TokenEstimator, CopilotUsageObserver, RunSummaryDebugFormatter.
  • +36 tests (TokenEstimatorTests, CopilotUsageObserverTests, RunSummaryDebugFormatterTests, TokenUsageReportEstimatedTests, plus extensions to TokenUsageTrackerTests, DebugLoggerTests, GenerateResultTokenUsageTests).

Changed

  • PhaseUsage, TokenUsageTracker.Record, TokenUsageReport, PhaseUsageDto, and DebugLogger.AppendAi all carry a new bool estimated field / parameter. The flag propagates via logical OR during aggregation — any estimated call in a phase flags the whole phase (and the run total) as estimated. Backwards compatible: the parameter defaults to false.
  • UpdateHandler emits the RUN TOTAL line with calls=0 phases= as a run-boundary marker (no AI calls today — still useful when grepping the log).

Notes

  • Event ordering between AssistantUsageEvent and SessionIdleEvent is undocumented in the SDK. The 200ms grace via TaskCompletionSource handles both orderings: returns immediately if usage arrived before idle, waits briefly if it arrives after. On timeout, the length-based estimate kicks in.
  • AssistantUsageData.InputTokens / OutputTokens are double? in the SDK assembly (not int as the XML doc implies) — we cast with (int)(value ?? 0).

[1.45.0] - 2026-04-11

Added

  • Interactive model preset menu (spec 041) — spectra init -i gains a new "AI Model Preset" step that offers four curated generator + critic pairings: (1) GPT-4.1 + GPT-5 mini (free, unlimited), (2) Claude Sonnet 4.5 + GPT-4.1 critic (premium, high quality), (3) GPT-4.1 + Claude Haiku 4.5 critic (free gen + cross-family critic), (4) Custom — writes preset 1 defaults so you can edit spectra.config.json by hand. Non-custom presets rewrite both ai.providers[0] and ai.critic in a single step and skip the separate granular critic wizard.
  • Embedded spectra.config.json template — the init config template is now an <EmbeddedResource> so spectra init produces the same file whether run from published (single-file) builds or from dev source. Previously the fallback path used ConfigLoader.GenerateDefaultConfig() which omitted the critic block entirely.

Changed

  • New default models (spec 041) — spectra init now writes gpt-4.1 (generator) + gpt-5-mini (critic) instead of gpt-4o / gpt-4o-mini. Both are 0× multiplier on any paid Copilot plan, and they're from different model architectures so the dual-model critic (spec 008) provides genuine cross-verification. SpectraConfig.Default, CriticConfig.GetEffectiveModel(), CopilotCritic.GetEffectiveModel(), and CopilotService.GetCriticModel() all reflect the new defaults. Per-provider defaults: github-models / openai / azure-openaigpt-5-mini; anthropic / azure-anthropicclaude-haiku-4-5.
  • spectra init -i granular critic step provider menu updated: dropped google (hard-error since spec 039), added github-models, updated anthropic / openai / azure-openai / azure-anthropic defaults to current model strings.

Backwards compatibility

  • Existing spectra.config.json files with gpt-4o / gpt-4o-mini (or any other valid model) continue to work unchanged. The Copilot SDK still routes those models. No migration is required — the new defaults only affect fresh spectra init runs.

[1.44.0] - 2026-04-11

Added

  • Token usage tracking & Run Summary (spec 040) — every spectra ai generate and spectra ai update run now prints a Run Summary panel (documents, behaviors, tests, verdict breakdown, duration) plus a per-phase Token Usage table grouped by (phase, model, provider) with an estimated USD cost for BYOK providers. github-models renders "Included in Copilot plan (rate limits apply)". New TokenUsageTracker (thread-safe, one instance per run) records every AI call across analysis, generation, critic, and criteria phases. Hardcoded rate table in CostEstimator covers gpt-4o, gpt-4o-mini, gpt-4.1 family, claude-sonnet-4, claude-3-5-haiku, deepseek-v3.2.
  • run_summary and token_usage in JSON output and .spectra-result.json — SKILLs and the live progress page read the same fields, so spectra-generate reports token totals + cost in its final summary message. The .spectra-progress.html page gains an AI Calls / Tokens In / Tokens Out / Total / AI Time card grid that updates as batches complete.
  • Model + provider in every AI debug log line — debug log format gains model=<name> provider=<name> tokens_in=<n|?> tokens_out=<n|?> suffix on all AI lines (ANALYSIS, BATCH, CRITIC, UPDATE, CRITERIA). Non-AI lines (testimize lifecycle, file I/O) are unchanged.
  • debug config section (spec 040) — new top-level debug block in spectra.config.json with enabled (bool, default false) and log_file (string, default .spectra-debug.log). spectra init writes the section disabled by default. --verbosity diagnostic force-enables debug logging for a single run without editing config.

Changed

  • .spectra-debug.log is now opt-in. Default Spectra runs produce zero debug log files, eliminating stale-file accumulation in CI environments. Existing configs without a debug section continue to load (no breaking change).
  • Unified TokenUsage record now lives in Spectra.Core.Models with PromptTokens / CompletionTokens fields (replaces the unused CLI-scoped InputTokens / OutputTokens record).

Removed

  • ai.debug_log_enabled has been removed from AiConfig. Use the new top-level debug.enabled field instead. Existing configs that still set ai.debug_log_enabled are silently ignored (System.Text.Json default behavior for unknown fields).

Notes

  • The Copilot SDK does not currently surface usage.prompt_tokens / completion_tokens on its response object, so token counts are recorded as null (rendered as ? in the debug log and 0 in the Run Summary table). All other Run Summary fields — call count, per-phase elapsed time, model, provider — are captured for every AI call. When the SDK begins exposing usage, a single read point in each agent picks it up with no further wiring.

[1.43.0]

Added

  • Tolerant analysis JSON parser (recovers from truncated reasoning-model output).
  • Shared DebugLogger writes CRITIC START/OK/TIMEOUT/ERROR and TESTIMIZE lifecycle lines to .spectra-debug.log.
  • Critic now honors critic.timeout_seconds (default bumped 30 → 120).
  • Progress page polls terminal pages every 5s for fresh-run detection.
  • Better analysis_failed error message.

[1.42.0]

Added

  • Configurable ai.analysis_timeout_minutes (default 2).
  • [analyze] lines in .spectra-debug.log.
  • spectra-generate SKILL handles analysis_failed status.

Changed

  • Analyze-only path returns status: "analysis_failed" with explanatory message instead of fake analyzed/15-recommended.

[1.41.0]

Added

  • Configurable per-batch generation timeout & batch size: ai.generation_timeout_minutes (default 5), ai.generation_batch_size (default 30), ai.debug_log_enabled (default true).
  • Per-batch BATCH START/OK/TIMEOUT lines in .spectra-debug.log.
  • Improved timeout error with remediation snippets.

[1.36.0] - 2026-04-10

Added

  • Quickstart SKILL & USAGE.md (spec 032) — spectra-quickstart is the 12th bundled SKILL: workflow-oriented onboarding for Copilot Chat triggered by phrases like "help me get started", "tutorial", "walk me through". Presents 12 SPECTRA workflows with example conversations. Companion USAGE.md written to project root by spectra init as an offline workflow reference. Both hash-tracked by update-skills. Generation and execution agent prompts defer onboarding requests to the new SKILL.
  • Visible default profile format & customization guide (spec 031) — profiles/_default.yaml and CUSTOMIZATION.md are now created by spectra init and bundled as embedded resources. Profile format is visible/editable instead of hardcoded. New ProfileFormatLoader.LoadEmbeddedDefaultYaml() and LoadEmbeddedCustomizationGuide() methods.
  • Customizable root prompt templates (spec 030) — .spectra/prompts/ directory with 5 markdown templates (behavior-analysis, test-generation, criteria-extraction, critic-verification, test-update) controlling all AI operations. Templates use {{placeholder}}, {{#if}}, {{#each}} syntax. New analysis.categories config section with 6 default categories. New spectra prompts list/show/reset/validate CLI commands. New spectra-prompts SKILL (11th bundled SKILL).
  • ProfileFormatLoader.LoadEmbeddedUsageGuide() for resolving the bundled USAGE.md content.
  • Bumped Spectre.Console 0.54.0 → 0.55.0, GitHub.Copilot.SDK 0.2.0 → 0.2.1, Markdig 1.1.1 → 1.1.2 (Dependabot PRs #10, #11, #12).

[1.35.0] - 2026-04-10

Added

  • spectra-update SKILL (10th bundled SKILL) for test update workflow via Copilot Chat
  • Agent delegation tables updated for update command routing
  • UpdateResult extended with totalTests, testsFlagged, flaggedTests, duration, success fields

[1.34.6] - 2026-04-09

Fixed

  • Dashboard test file path resolution & version bump

[1.34.0] - 2026-04-08

Fixed

  • Criteria coverage, index generation, dashboard SKILL & CI pragma fixes

[1.33.0] - 2026-04-07

Fixed

  • Criteria generation, polling, progress caching & CI build fixes

[1.32.0] - 2026-04-06

Changed

  • Refined SPECTRA agents/skills CLI workflows

[1.31.0] - 2026-04-05

Added

  • Coverage semantics fix & criteria-generation pipeline (spec 028)
  • TestCaseParser now propagates Criteria field from frontmatter to TestCase
  • Criteria loading wired into GenerateHandler for per-doc .criteria.yaml context

[1.30.0] - 2026-04-04

Added

  • Docs index SKILL integration, progress page, coverage fix & terminology rename (spec 024)
  • spectra-docs SKILL (9th bundled SKILL) with structured tool-call-sequence
  • --skip-criteria flag for docs index command

Fixed

  • Dashboard coverage null-crash fix with zero-state defaults

Historical Specs (pre-1.30 versioning)

Spec entries that predate the semver [N.N.N] release format. Listed newest-first by spec number.

  • 039-unify-critic-providers — critic provider list aligned with generator (canonical 5: github-models, azure-openai, azure-anthropic, openai, anthropic). githubgithub-models legacy alias with stderr warning; google is now hard error. New DefaultProvider constant; case-insensitive resolution.
  • 038-testimize-integration — optional Testimize.MCP.Server integration for algorithmic test data optimization (BVA/EP/pairwise/ABC). Disabled by default. New testimize config section, TestimizeMcpClient, two AI tools, spectra testimize check command. Graceful degradation everywhere.
  • 037-istqb-test-techniques — ISTQB black-box techniques (EP, BVA, DT, ST, EG, UC) added to all 5 prompt templates. New IdentifiedBehavior.Technique field, BehaviorAnalysisResult.TechniqueBreakdown map, AcceptanceCriterion.TechniqueHint. Technique breakdown rendered in analysis presenter and progress page.
  • 034-github-pages-docs — GitHub Pages docs site (Just the Docs theme) at automatetheplanet.github.io/Spectra/. Auto-deploys on push to main via .github/workflows/docs.yml.
  • 033-from-description-chat-flow — doc-aware --from-description flow. UserDescribedGenerator.BuildPrompt() testable; best-effort loads matching docs (cap 3×8000 chars) + criteria; resulting tests get source_refs/criteria (verdict stays manual). Intent routing in SKILL/agent.
  • 027-skill-agent-dedup — agents delegate CLI tasks to SKILL files (execution agent ~400 → 120 lines, generation agent ~219 → 81). Standardized "Step N" format and --no-interaction --output-format json --verbosity quiet flags.
  • 026-criteria-folder-coverage-fix — renamed docs/requirements/docs/criteria/ with auto-migration in AnalyzeHandler. Skip _index.*/.criteria.yaml/_criteria_index.yaml from criteria extraction. Dashboard uses AcceptanceCriteriaCoverageAnalyzer.
  • 025-universal-skill-progress — shared ProgressManager + ProgressPhases. All 9 SKILL-wrapped commands write .spectra-result.json; 6 long-running ones write .spectra-progress.html with phase stepper. Renamed RequirementsAcceptanceCriteria.
  • 023-criteria-extraction-overhaul — "requirements" → "acceptance criteria" rename. Per-document iterative extraction → individual .criteria.yaml + _criteria_index.yaml. SHA-256 incremental. Import from YAML/CSV/JSON with Jira/ADO column auto-detection, AI splitting, RFC 2119 normalization. --merge/--replace/--list-criteria modes. spectra-criteria SKILL.
  • 023-copilot-chat-integration — full VS Code Copilot Chat integration. --suite flag, --analyze-only two-step flow, automatic batch generation (groups of 30), .spectra-result.json lifecycle, BehaviorAnalyzer timeout 30s → 2min, 7 bundled SKILLs.
  • 022-bundled-skills — bundled SKILL files + 2 agent prompts created by spectra init in .github/skills/ and .github/agents/. SHA-256 hash tracking via SkillsManifest. New spectra update-skills command, --skip-skills flag.
  • 021-generation-session — four-phase generation flow (Analysis → Generation → Suggestions → User-Described). .spectra/session.json (1h TTL). New UserDescribedGenerator, DuplicateDetector (Levenshtein), SuggestionPresenter. CLI: --from-suggestions, --from-description, --context, --auto-complete.
  • 020-cli-non-interactive — global --output-format human|json and --no-interaction on all commands. Typed Results/ models per command, JsonResultWriter. Standardized exit codes (0/1/2/3).
  • 017-mcp-tool-resilience — MCP run_id/test_handle auto-resolve when only one active run/test exists. New list_active_runs, cancel_all_active_runs tools. Enhanced get_run_history with status filter + summary counts.
  • 015-auto-requirements-extraction — AI extraction of testable requirements via Copilot SDK. RequirementsWriter (YAML merge, duplicate detection, sequential REQ-NNN). CLI: spectra ai analyze --extract-requirements.
  • 014-open-source-ready — README redesign, CI pipeline, NuGet publish pipeline, GitHub issue/PR templates, Dependabot.
  • 013-cli-ux-improvementsNextStepHints after every command. Init prompts for automation dirs + critic model. New spectra config add/remove/list-automation-dir subcommands. Interactive generation continuation menu.
  • 012-dashboard-branding — dashboard branding (company name, logo, favicon, light/dark theme, CSS variable overrides). BrandingInjector + SampleDataFactory. dashboard --preview mode.
  • 011-coverage-overhaul — unified three-type coverage (Documentation, Requirements, Automation). New analyzers/services/models. CLI --auto-link writes automated_by back into test files.
  • 010-smart-test-selection — cross-suite test search via find_test_cases MCP tool. start_execution_run accepts test_ids and selection modes. New get_test_execution_history, list_saved_selections tools.
  • 010-document-index — persistent docs/_index.md with per-doc metadata + SHA-256 incremental updates. spectra docs index [--force]. Auto-refresh before ai generate. GetDocumentMapTool prefers index when available.
  • 009-copilot-sdk-consolidation — unified all AI under GitHub Copilot SDK as the sole runtime. Removed legacy per-provider agent/critic implementations (OpenAiAgent, AnthropicAgent, GitHubModelsAgent, MockAgent, GoogleCritic, OpenAiCritic, AnthropicCritic, GitHubCritic).
  • 008-grounding-verification — dual-model critic flow (generator + verifier), three verdicts (grounded/partial/hallucinated), grounding metadata in YAML frontmatter, configurable critic provider, --skip-critic flag.
  • 007-execution-agent-mcp-tools — added MCP tools for execution agents.
  • 006-conversational-generation — two-mode generation (Direct/Interactive), test updates with classification (UP_TO_DATE, OUTDATED, ORPHANED, REDUNDANT), Spectre.Console UX.
  • 004-test-generation-profile — profile support for test generation settings.