All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Direct
Testimize1.1.10 NuGet reference inSpectra.CLI.csproj. The library runs in the same process as the CLI — no child process spawn, no JSON-RPC, no MCP handshake. - New
TestimizeRunneratsrc/Spectra.CLI/Agent/Testimize/TestimizeRunner.csorchestrates the in-process call. MapsFieldSpecvalues from behavior analysis ontoTestimizeInputBuilder.Add*calls, mapsconfig.Testimize.Strategy/Modestrings onto the library's enums, and callsTestimizeEngine.Configure(...).Generate(). Runs once per suite between behavior analysis and the batch generation loop (not once per batch). - Behavior-analysis schema extended with a top-level
field_specs[]array. The AI emits structured field constraints (numeric ranges, string lengths, date ranges, enumerated values, required flags, expected error messages) in the same JSON response asbehaviors.BehaviorAnalyzer.ParseFieldSpecsdeserialises it. When the AI returns no field specs,TestimizeRunnerfalls back to the localFieldSpecAnalysisTools.Analyzeregex extractor over the raw documentation. - Test-generation prompt template rewritten: the old
{{#if testimize_enabled}}block that instructed the model to calltestimize/generate_hybrid_test_cases/testimize/generate_pairwise_test_casesis replaced by{{#if testimize_dataset}}, which embeds a literalPRE-COMPUTED ALGORITHMIC TEST DATA (from Testimize, strategy=…)YAML block with the exact values, categories, and expected error messages the engine produced. The model no longer has to "choose" to use Testimize — the values are in the prompt as authoritative facts. - New shared types
Spectra.Core.Models.Testimize.FieldSpec,TestimizeDataset,TestimizeRow,TestimizeCell— contract between behavior analysis,TestimizeRunner, and the prompt embedder. Spectra.Core has no dependency on Testimize itself, keeping the library surface isolated to Spectra.CLI. testimizeSettings.jsonships with the tool atsrc/Spectra.CLI/Templates/testimizeSettings.json. Copied to the tool's output directory soTestimize.FakerFactory.Initializefinds it at runtime (without it, the library throwsNullReferenceExceptionbefore generating anything).TESTIMIZE OK strategy=… fields=… test_data_sets=… elapsed=…slog line in.spectra-debug.log, once per suite, when generation succeeds. ReplacesTESTIMIZE CONFIGURED/LOADED/STATUS_CHANGED/DISPOSED/GATE mcp_loadedlines. New skip reasons:disabled,no_field_specs,insufficient_fields(Testimize requires ≥ 2 parameters). AlsoTESTIMIZE FALLBACK source=regex fields=<n>when the regex extractor recovers from an empty AI response, andTESTIMIZE ERROR exception=<Type> message="…"when the engine throws.
src/Spectra.CLI/Agent/Copilot/McpConfigBuilder.cs+ tests — no longer needed, MCP server registration is gone.src/Spectra.CLI/Agent/Testimize/TestimizeDetector.cs+ tests — thedotnet tool list -gprobe for the old globalTestimize.MCP.Servertool is obsolete.tests/Spectra.CLI.Tests/Agent/Copilot/TestimizeStrategyResolverTests.cs— testedCopilotGenerationAgent.ResolveTestimizeStrategyToolName, which mapped config strategy to MCP tool names. The helper is deleted.- MCP event subscription branches in
CopilotGenerationAgent:SessionMcpServersLoadedEvent,SessionMcpServerStatusChangedEvent,mcpServersLoadedTcsTaskCompletionSource, and the 30-secondGATE mcp_loadedblock added in 1.48.2. mcpServersparameter fromCopilotService.CreateGenerationSessionAsync.TestimizeMcpConfignested class andMcpproperty fromTestimizeConfig. Old configs withtestimize.mcp.command/argsstill parse cleanly — the JSON serializer ignores the unknown keys — but those values no longer affect anything.testimize.mcpblock from the defaultspectra.config.jsontemplate.InstallCommandfield fromTestimizeCheckResult— there is no global tool to install anymore.
TestimizeCheckHandlernow probes theTestimizeassembly via reflection (typeof(Testimize.Usage.TestimizeEngine).Assembly) instead of shelling out todotnet tool list -g. "Installed" means "assembly is loadable in the current process" — always true when Spectra.CLI is running, since it's a compile-timePackageReference.HealthymeansEnabled && Installed.
Across every log from 1.48.1 and 1.48.2 the Copilot SDK consistently fired TESTIMIZE LOADED ~700 ms after BATCH START, never before. The SDK loads MCP servers lazily — the child process doesn't spawn until the first SendAndWaitAsync — so the model's tool catalog is frozen before testimize's tools can reach it. Every attempt to gate the send on SessionMcpServersLoadedEvent timed out because the event cannot fire until the send it's trying to gate. The in-process path sidesteps the entire race: by the time the prompt is built, the test data already exists as strings that go straight into the prompt, and the model uses them verbatim.
- All tests still pass: 514 Core / 351 MCP / 839 CLI (1704 total). New
TestimizeRunnerTestscovers strategy/mode enum mapping, empty-spec skip, insufficient-fields skip, regex fallback recovery, Integer+Email, Date+Integer, and SingleSelect+Integer happy paths. - Testimize prints its output-generator result to stdout and attempts to copy it to the clipboard on every run.
TestimizeRunnertemporarily redirectsConsole.Outduring theGenerate()call so neither corrupts--output-format jsonnor fails on headless CI. - Testimize's generators require ≥ 2 parameters (
Pairwise testing requires at least two parameters.). Single-field suites skip cleanly withTESTIMIZE SKIP reason=insufficient_fields fields=1 ….
ai.critic.max_concurrentconfig option (default1, max20) — runs the critic verification phase in parallel using aSemaphoreSlimthrottle. Withmax_concurrent: 5on a 200-test suite, the critic phase typically completes in ~1/5 of sequential time. Output is unchanged: results are written into a pre-sized array indexed by original position, so test files, indexes, and verdicts come back in the same order regardless of completion order. Manual-verdict short-circuit is preserved. Default of1keeps the byte-identical sequential path..spectra-errors.logdedicated error log — companion to.spectra-debug.log. Written only whendebug.enabled: trueAND at least one error occurs during the run (lazy file creation; clean runs leave it untouched). Captures full exception type, message, response body (truncated to 500 chars),Retry-Afterheader, and stack trace. Newdebug.error_log_fileconfig (default.spectra-errors.log) and matchingmode(append/overwrite) following the debug log. Errors are captured at every catch site that talks to the AI runtime:BehaviorAnalyzer,CopilotGenerationAgent,CopilotCritic,CriteriaExtractor(viaAnalyzeHandler), andUpdateHandler. Each captured error gets asee=.spectra-errors.logcross-reference suffix on the corresponding debug log line.- Rate limit + error counts in Run Summary — new
Errors,Rate limits, andCritic concurrencyrows in the Spectre.Console panel. WhenRate limits > 0, a hint(consider reducing ai.critic.max_concurrent)appears next to the count. The same counts suffix theRUN TOTALdebug log line asrate_limits=<n> errors=<n>(always present, even on zero runs, for grep-friendly CI consumption). ErrorLoggerstatic class atsrc/Spectra.CLI/Infrastructure/ErrorLogger.cs— mirrorsDebugLogger's static-class pattern. Thread-safe via single lock. File-write failures are caught, emit one stderr warning, then disable further writes for the run (graceful degradation; never aborts the run). IncludesIsRateLimit(Exception)classifier (HTTP 429 /HttpStatusCode.TooManyRequests/ message-substring fallbacks for429,rate limit,too many requests,rate_limit_exceeded).RunErrorTrackerinstance class atsrc/Spectra.CLI/Services/RunErrorTracker.cs— per-run counters withErrors+RateLimitsproperties; thread-safe viaInterlocked. Lives onGenerateHandlerandUpdateHandlerand is forwarded intoBehaviorAnalyzer,CopilotGenerationAgent,CriticFactory.TryCreate, andCopilotCriticconstructor.- +14 new tests —
CriticConfigClampTests(clamping ≤0→1, >20→20, in-range pass-through, JSON roundtrip),ErrorLoggerTests(file lazy-creation, disabled no-op, stack trace + response body capture, response truncation at 500, retry-after, multi-error append, concurrent-write thread safety,IsRateLimitclassifier),RunErrorTrackerTests(counter atomicity under 1000-task contention), and 2 newRunSummaryDebugFormatterTestscases for the error/rate-limit suffixes.
RunSummaryDebugFormatter.FormatRunTotalgained an optionalRunErrorTrackerparameter; line now always ends withrate_limits=<n> errors=<n>.RunSummaryPresenter.Rendergained optionalerrorTrackerandcriticConcurrencyparameters that drive the new rows.CopilotCritic.VerifyTestAsyncdistinguishes rate-limit failures from generic exceptions in the debug log (CRITIC RATE_LIMITEDvs.CRITIC ERROR).CriticFactory.TryCreateandTryCreateAsyncaccept an optionalRunErrorTrackerparameter and forward it toCopilotCritic.AgentFactory.CreateAgentAsyncaccepts an optionalRunErrorTrackerparameter and forwards it toCopilotGenerationAgent.
max_concurrent > 10emits a one-line stderr warning at run start about provider rate-limit risk.max_concurrent > 20emits a "clamped to 20" warning.- The error log uses lazy file creation: on a clean run, no
.spectra-errors.logfile is touched, so its mere existence indicates "this run had at least one failure". - All existing tests still pass (513 Core / 351 MCP / 827 CLI). Adds 14 new tests.
TOOL CALLlines in.spectra-debug.logfor the generation flow.CopilotGenerationAgentnow subscribes toToolExecutionStartEvent+ToolExecutionCompleteEventfrom the Copilot SDK and emits one line per tool invocation, with elapsed time computed by correlating start/complete viaToolCallId. Format:TOOL CALL tool=<name> mcp_server=<server-or--> elapsed=<sec>s success=<bool>. For MCP tools the SDK exposesMcpServerName+McpToolNamedirectly, so testimize calls show astool=generate_hybrid_test_cases mcp_server=testimize elapsed=1.23s success=true— making it possible to verify (a) whether the AI is actually invoking testimize tools during generation, or just ignoring them, and (b) which built-in tools (ReadDocument, GetNextTestIds, BatchWriteTests, ...) take how long. Previously only the testimize lifecycle (CONFIGURED/LOADED/STATUS_CHANGED/DISPOSED) was logged, with no visibility into per-call usage.
- Tool-call tracking is per-session, scoped to a single
GenerateTestsAsynccall. The correlation dictionary isConcurrentDictionary<string, ...>since SDK events fire from internal tasks. - Only added to the generation agent (
CopilotGenerationAgent). The other Copilot-SDK agents —BehaviorAnalyzer,CriteriaExtractor,GroundingAgent(critic) — don't currently emitTOOL CALLlines. They could be extended the same way if needed; the only flow where MCP tools (testimize) are wired in is generation, which is what this change targets.
- Live progress bars for
spectra ai generateandspectra ai update(spec 041). Long runs (--count 40+) now show real progress feedback in two places:- Terminal: a
Generating testsSpectre.Console progress bar advances per batch, followed by aVerifying testsbar that increments once per critic call (showing the most recent test ID and verdict). Forupdate, anUpdating testsbar advances per applied proposal. - Browser progress page (
.spectra-progress.html): a new live progress section renders the same data driven by an in-flightprogressobject inside.spectra-result.json, picked up on the existing 1.5s auto-refresh cycle. Verification bar is dimmed during the generation phase, then activated.
- Terminal: a
progressfield in.spectra-result.json— present only while a run is in flight (snapshot of phase, target, generated, verified, current batch, total batches, last test ID, last verdict). Removed automatically byProgressManager.Complete()/Fail()so the final result file showsrunSummaryinstead. Schema documented inspecs/041-progress-bars/contracts/result-json-progress-schema.md.ProgressReporter.ProgressTwoTaskAsync(...)— new wrapper that creates two sequential Spectre tasks inside a single live region. Handles passed to the action implement a smallIProgressTaskHandleabstraction so command handlers don't take a direct dependency on Spectre internals.ProgressSnapshot+ProgressPhasemodel atsrc/Spectra.CLI/Progress/ProgressSnapshot.cs.- +17 unit tests —
ProgressBarTests,ProgressManagerProgressFieldTests,ProgressPageProgressBarTestscovering suppression rules, progress field clear-on-complete/fail, and HTML rendering for generation/verifying/updating phases.
ProgressReportersuppression rules — now suppresses progress bars not just under--output-format json, but also under--verbosity quietand when stdout is non-interactive (redirected, piped, or no TTY). SKILL/CI invocations remain ANSI-free.ProgressReporter.StatusAsync— short-circuits when the reporter is inside an activeProgressTwoTaskAsyncblock, so existing inline_progress.StatusAsync(...)spinners inside the batch loop don't conflict with the outer progress bars.UpdateHandler.ApplyChangesAsync— gained an optionalonProposalAppliedcallback parameter for per-proposal progress reporting. Existing call sites unaffected.
- ETA / time-remaining display, per-test progress within a single generation batch, and formal cancel-with-cleanup on Ctrl+C are explicitly out of scope. See
specs/041-progress-bars/spec.md. --output-format jsonand--verbosity quietbehavior is unchanged: SKILLs and CI scripts see no progress-bar output on stdout.
- False
TESTIMIZE UNHEALTHY no_load_event_within_3sline in debug log. The 3-second grace window inGenerationAgentwas based on a wrong assumption about how fast the Copilot CLI's MCP client handshakes with a server — in reality it takes ~60s to attempt + time out an initialize against a non-spec-compliant server. Removed the grace window entirely. The SDK'sSessionMcpServersLoadedEventis the single source of truth; when it fires (success or failure), the real status is logged with the real error message. Generation proceeds immediately because the SDK does not block session creation on MCP loading. - Session-companion fix in
Testimize.MCP.Server(separate repo): the server previously emitted JSON-RPC responses with both"result"and"error":null, which is a JSON-RPC 2.0 spec violation — the spec says a response MUST containresultORerror, not both. The Copilot CLI's MCP client (built on@modelcontextprotocol/sdk) correctly rejects malformed responses and times out at ~60s withMCP error -32001: Request timed out. Fix: serialize only the field that applies. SeeTestimize.MCP.Server/Program.cschange; requires rebuilding testimize-mcp 1.1.10 from source and reinstalling.
- Zero code-path changes beyond the 3s grace removal — token tracking, debug log formatting, run summary,
RUN TOTALline, and the nativeSessionConfig.McpServerswiring from v1.46.0 are all untouched. - The SDK's behavior (blocking vs non-blocking session creation on MCP load) was empirically verified by observing the timing in
.spectra-debug.log: session creation completed immediately, generation ran in parallel with the MCP handshake, and the load event arrived ~68s later withstatus=Failed.
- Native MCP integration for Testimize —
spectra ai generatenow passes Testimize to the Copilot SDK via its nativeSessionConfig.McpServersAPI. The SDK owns the full lifecycle: process spawn,initializehandshake, framing variant (Content-Length OR newline-delimited), tool discovery, cost attribution (AssistantUsageData.Initiator = "mcp-sampling"), permission prompts, and OAuth. No more custom JSON-RPC protocol client. McpConfigBuilder.BuildTestimizeServer(TestimizeConfig)— small static helper that translates the user-facingtestimizeconfig block into anMcpLocalServerConfigfor the SDK. Unit-tested in isolation.CopilotService.CreateGenerationSessionAsync(..., mcpServers)— new optional parameter on the session factory that forwards an MCP server map intoSessionConfig.McpServers. Existing callers unaffected.- SDK-driven testimize lifecycle in
.spectra-debug.log— the agent subscribes toSessionMcpServersLoadedEventandSessionMcpServerStatusChangedEventand writesTESTIMIZE CONFIGURED / LOADED / STATUS_CHANGED / DISPOSEDlines driven by the SDK's own status enum (Connected / Failed / NeedsAuth / Pending / Disabled / NotConfigured). No more 5-second probe hangs. testimize_strategyprompt placeholder — thetest-generation.mdtemplate now embeds the preferred testimize tool name (derived fromtestimize.strategy) so the AI knows whether to prefertestimize/generate_hybrid_test_cases(HybridArtificialBeeColony, default) ortestimize/generate_pairwise_test_cases(fast Pairwise).- Defensive 3-second grace window — if testimize is configured but the SDK hasn't reported a load status by the time the handler is ready to send the prompt, the agent logs
TESTIMIZE UNHEALTHY reason=no_load_event_within_3sand continues. The AI will still succeed, just without testimize's optimized test data. - +17 tests —
McpConfigBuilderTests(7 cases) andTestimizeStrategyResolverTests(4 theory cases + 1 fact + 1 edge), plus updates toTestimizeConditionalBlockTestsandAnalyzeFieldSpecTeststo reflect the new tool names.
- Testimize health probe 5-second hang. Root cause: our custom
TestimizeMcpClientusedContent-Length: N\r\n\r\n<body>MCP-spec-v1 framing, butTestimize.MCP.Serveruses newline-delimited JSON (ReadLineAsync/WriteLineAsync). Worse, Spectra wrote the JSON body with no trailing\n, so testimize'sReadLineAsyncblocked forever waiting for one — while Spectra's ownReadLineAsyncblocked waiting for a response. Exactly 5 seconds later, the probe timeout fired. Deleting the client and delegating to the SDK's native MCP implementation fixes this class of bug permanently (the SDK handles both framing variants correctly).
src/Spectra.CLI/Agent/Testimize/TestimizeMcpClient.cs— the 240-line custom JSON-RPC-over-stdio client with the Content-Length framing bug. Gone.src/Spectra.CLI/Agent/Testimize/TestimizeTools.cs— theCreateGenerateTestDataToolAIFunctionwrapper that routed to the broken client. Gone. The AI now calls the real testimize MCP tools directly.- Tests for the deleted classes —
TestimizeMcpClientTests,TestimizeMcpClientGracefulTests,TestimizeToolsTests. Also deleted one obsolete test fromTestimizeCheckHandlerTests(Check_EnabledButBogusCommand_...) that asserted old-client behavior. spectra testimize check's "healthy" field now just mirrorsinstalled— the real runtime health check moved tospectra ai generatewhere the debug log'sTESTIMIZE LOADED server=testimize status=Connectedline is the authoritative signal.
FieldSpecAnalysisTools.cs(new file, replaces the surviving half ofTestimizeTools.cs) — containsCreateAnalyzeFieldSpecTool+ the local regex-basedExtractFieldshelper. No behavior change; same signatures, new class name scoped to what the code actually does.test-generation.mdprompt template — the{{#if testimize_enabled}}block now names both real testimize MCP tools explicitly, references the new{{testimize_strategy}}placeholder, and instructs the AI to callAnalyzeFieldSpecfirst for prose-described fields.behavior-analysis.mdprompt template — same update: referencestestimize/generate_hybrid_test_cases/testimize/generate_pairwise_test_casesinstead of the deletedGenerateTestDatawrapper name.
- The user-facing config schema (
testimize.enabled,testimize.mcp.command,testimize.mcp.args,testimize.strategy,testimize.mode) is unchanged. Existingspectra.config.jsonfiles continue to work with no edits. TestimizeDetector.IsInstalledAsync()is unchanged — still checksdotnet tool list -gfortestimize.mcp.server.
- Run header in
.spectra-debug.log— everygenerate/updaterun now writes a header block at the start containing the SPECTRA version, the full user command line, and an ISO-8601 UTC timestamp:Makes each run self-identifying when grepping or comparing logs across days.────────────────────────────────────────────────────────────── === SPECTRA v1.45.2 | spectra ai generate checkout --count 20 | 2026-04-11T14:30:00Z === debug.modeconfig option with two values:"append"(default): keeps existing log content and prepends a horizontal separator + header before each new run. Good for comparing multiple runs post-hoc."overwrite": truncates.spectra-debug.logat run start so only the latest run is preserved. Good for focused troubleshooting. Unknown values fall back to"append". Wired fromGenerateHandlerandUpdateHandlerat run start via newDebugLogger.BeginRun().
- New
DebugLogger.BeginRun()static method that materializes the header and honors the mode. Best-effort — never throws, no-op when debug logging is disabled. - Version is read via
AssemblyInformationalVersionAttributeso the header reflects the actual installed build. - Command line is reconstructed from
Environment.GetCommandLineArgs()(skipping the dotnet tool shim path) so it looks like what the user typed. - +11 tests:
BeginRun_Disabled_NoFileWritten,BeginRun_Overwrite_TruncatesAndWritesHeader,BeginRun_Append_ExistingFile_PrependsSeparatorAndHeader,BeginRun_Append_MissingFile_CreatesWithHeader,BeginRun_UnknownMode_TreatedAsAppend,BeginRun_HeaderIncludesIsoTimestamp,BeginRun_ThenAppendAi_LogContainsHeaderAndAiLine,BeginRun_ThenAppendTestimize_LinesSurvive(regression test for the mode=overwrite truncate), plusDebugConfigTests.Default_Mode_IsAppendand two deserialization round-trip tests.
.spectra-debug-analysis.txt— no longer written byBehaviorAnalyzeronANALYSIS PARSE_FAIL. The parse-fail reason is still logged to.spectra-debug.log..spectra-debug-response.txt— no longer written byGenerationAgentafter each batch. TheSaveDebugResponsemethod and its call site have been deleted. The stale "Check .spectra-debug-response.txt for the raw AI output" error message now points at.spectra-debug.log.- All debug output now flows through a single file:
.spectra-debug.log.
- Testimize lifecycle lines (
TESTIMIZE DISABLED/START/NOT_INSTALLED/UNHEALTHY/HEALTHY/DISPOSED) are written viaDebugLogger.Append("generate", ...)afterBeginRun, so they survive themode=overwritetruncate. Covered byBeginRun_ThenAppendTestimize_LinesSurvive.
- Real token counts from
AssistantUsageEvent(spec 040 follow-up) — the Copilot SDK surfaces provider-reportedInputTokens/OutputTokenson a separateAssistantUsageEvent(not onAssistantMessageEventas Spectra previously assumed). Every AI call site (BehaviorAnalyzer, GenerationAgent batch loop, GroundingAgent critic, CriteriaExtractor — both methods) now subscribes to this event via a newCopilotUsageObserverhelper, waits a 200ms grace after the request for ordering-safe capture, and records the real counts intoTokenUsageTracker. No more?placeholders in the normal case. text.Length / 4fallback when the usage event fails to arrive within the grace window — captured by the newTokenEstimatorservice. Fallback values are flagged end-to-end:~tokens_in=…/~tokens_out=…in the debug log and"estimated": trueon both thetoken_usagereport and each affected phase DTO in JSON.RUN TOTALsummary line written to.spectra-debug.logat the end of everygenerate/updaterun, via the newRunSummaryDebugFormatter. Format:RUN TOTAL command=generate suite=checkout calls=24 tokens_in=64480 tokens_out=24240 elapsed=2m45s phases=analysis:1/18.5s,generation:3/52.3s,critic:20/1m34s. Makes the log self-summarizing — one grep and you have the run-level numbers. When any phase used the estimate fallback, the totals gain~prefixes.- New services:
TokenEstimator,CopilotUsageObserver,RunSummaryDebugFormatter. - +36 tests (TokenEstimatorTests, CopilotUsageObserverTests, RunSummaryDebugFormatterTests, TokenUsageReportEstimatedTests, plus extensions to TokenUsageTrackerTests, DebugLoggerTests, GenerateResultTokenUsageTests).
PhaseUsage,TokenUsageTracker.Record,TokenUsageReport,PhaseUsageDto, andDebugLogger.AppendAiall carry a newbool estimatedfield / parameter. The flag propagates via logical OR during aggregation — any estimated call in a phase flags the whole phase (and the run total) as estimated. Backwards compatible: the parameter defaults tofalse.UpdateHandleremits theRUN TOTALline withcalls=0 phases=as a run-boundary marker (no AI calls today — still useful when grepping the log).
- Event ordering between
AssistantUsageEventandSessionIdleEventis undocumented in the SDK. The 200ms grace viaTaskCompletionSourcehandles both orderings: returns immediately if usage arrived before idle, waits briefly if it arrives after. On timeout, the length-based estimate kicks in. AssistantUsageData.InputTokens/OutputTokensaredouble?in the SDK assembly (notintas the XML doc implies) — we cast with(int)(value ?? 0).
- Interactive model preset menu (spec 041) —
spectra init -igains a new "AI Model Preset" step that offers four curated generator + critic pairings: (1) GPT-4.1 + GPT-5 mini (free, unlimited), (2) Claude Sonnet 4.5 + GPT-4.1 critic (premium, high quality), (3) GPT-4.1 + Claude Haiku 4.5 critic (free gen + cross-family critic), (4) Custom — writes preset 1 defaults so you can editspectra.config.jsonby hand. Non-custom presets rewrite bothai.providers[0]andai.criticin a single step and skip the separate granular critic wizard. - Embedded
spectra.config.jsontemplate — the init config template is now an<EmbeddedResource>sospectra initproduces the same file whether run from published (single-file) builds or from dev source. Previously the fallback path usedConfigLoader.GenerateDefaultConfig()which omitted the critic block entirely.
- New default models (spec 041) —
spectra initnow writesgpt-4.1(generator) +gpt-5-mini(critic) instead ofgpt-4o/gpt-4o-mini. Both are 0× multiplier on any paid Copilot plan, and they're from different model architectures so the dual-model critic (spec 008) provides genuine cross-verification.SpectraConfig.Default,CriticConfig.GetEffectiveModel(),CopilotCritic.GetEffectiveModel(), andCopilotService.GetCriticModel()all reflect the new defaults. Per-provider defaults:github-models/openai/azure-openai→gpt-5-mini;anthropic/azure-anthropic→claude-haiku-4-5. spectra init -igranular critic step provider menu updated: droppedgoogle(hard-error since spec 039), addedgithub-models, updatedanthropic/openai/azure-openai/azure-anthropicdefaults to current model strings.
- Existing
spectra.config.jsonfiles withgpt-4o/gpt-4o-mini(or any other valid model) continue to work unchanged. The Copilot SDK still routes those models. No migration is required — the new defaults only affect freshspectra initruns.
- Token usage tracking & Run Summary (spec 040) — every
spectra ai generateandspectra ai updaterun now prints a Run Summary panel (documents, behaviors, tests, verdict breakdown, duration) plus a per-phase Token Usage table grouped by(phase, model, provider)with an estimated USD cost for BYOK providers.github-modelsrenders "Included in Copilot plan (rate limits apply)". NewTokenUsageTracker(thread-safe, one instance per run) records every AI call across analysis, generation, critic, and criteria phases. Hardcoded rate table inCostEstimatorcovers gpt-4o, gpt-4o-mini, gpt-4.1 family, claude-sonnet-4, claude-3-5-haiku, deepseek-v3.2. run_summaryandtoken_usagein JSON output and.spectra-result.json— SKILLs and the live progress page read the same fields, sospectra-generatereports token totals + cost in its final summary message. The.spectra-progress.htmlpage gains an AI Calls / Tokens In / Tokens Out / Total / AI Time card grid that updates as batches complete.- Model + provider in every AI debug log line — debug log format gains
model=<name> provider=<name> tokens_in=<n|?> tokens_out=<n|?>suffix on all AI lines (ANALYSIS, BATCH, CRITIC, UPDATE, CRITERIA). Non-AI lines (testimize lifecycle, file I/O) are unchanged. debugconfig section (spec 040) — new top-leveldebugblock inspectra.config.jsonwithenabled(bool, default false) andlog_file(string, default.spectra-debug.log).spectra initwrites the section disabled by default.--verbosity diagnosticforce-enables debug logging for a single run without editing config.
.spectra-debug.logis now opt-in. Default Spectra runs produce zero debug log files, eliminating stale-file accumulation in CI environments. Existing configs without adebugsection continue to load (no breaking change).- Unified
TokenUsagerecord now lives inSpectra.Core.ModelswithPromptTokens/CompletionTokensfields (replaces the unused CLI-scopedInputTokens/OutputTokensrecord).
ai.debug_log_enabledhas been removed fromAiConfig. Use the new top-leveldebug.enabledfield instead. Existing configs that still setai.debug_log_enabledare silently ignored (System.Text.Json default behavior for unknown fields).
- The Copilot SDK does not currently surface
usage.prompt_tokens/completion_tokenson its response object, so token counts are recorded asnull(rendered as?in the debug log and0in the Run Summary table). All other Run Summary fields — call count, per-phase elapsed time, model, provider — are captured for every AI call. When the SDK begins exposing usage, a single read point in each agent picks it up with no further wiring.
- Tolerant analysis JSON parser (recovers from truncated reasoning-model output).
- Shared
DebugLoggerwritesCRITIC START/OK/TIMEOUT/ERRORandTESTIMIZElifecycle lines to.spectra-debug.log. - Critic now honors
critic.timeout_seconds(default bumped 30 → 120). - Progress page polls terminal pages every 5s for fresh-run detection.
- Better
analysis_failederror message.
- Configurable
ai.analysis_timeout_minutes(default 2). [analyze]lines in.spectra-debug.log.spectra-generateSKILL handlesanalysis_failedstatus.
- Analyze-only path returns
status: "analysis_failed"with explanatory message instead of fakeanalyzed/15-recommended.
- Configurable per-batch generation timeout & batch size:
ai.generation_timeout_minutes(default 5),ai.generation_batch_size(default 30),ai.debug_log_enabled(default true). - Per-batch
BATCH START/OK/TIMEOUTlines in.spectra-debug.log. - Improved timeout error with remediation snippets.
- Quickstart SKILL & USAGE.md (spec 032) —
spectra-quickstartis the 12th bundled SKILL: workflow-oriented onboarding for Copilot Chat triggered by phrases like "help me get started", "tutorial", "walk me through". Presents 12 SPECTRA workflows with example conversations. CompanionUSAGE.mdwritten to project root byspectra initas an offline workflow reference. Both hash-tracked byupdate-skills. Generation and execution agent prompts defer onboarding requests to the new SKILL. - Visible default profile format & customization guide (spec 031) —
profiles/_default.yamlandCUSTOMIZATION.mdare now created byspectra initand bundled as embedded resources. Profile format is visible/editable instead of hardcoded. NewProfileFormatLoader.LoadEmbeddedDefaultYaml()andLoadEmbeddedCustomizationGuide()methods. - Customizable root prompt templates (spec 030) —
.spectra/prompts/directory with 5 markdown templates (behavior-analysis, test-generation, criteria-extraction, critic-verification, test-update) controlling all AI operations. Templates use{{placeholder}},{{#if}},{{#each}}syntax. Newanalysis.categoriesconfig section with 6 default categories. Newspectra prompts list/show/reset/validateCLI commands. Newspectra-promptsSKILL (11th bundled SKILL). ProfileFormatLoader.LoadEmbeddedUsageGuide()for resolving the bundledUSAGE.mdcontent.- Bumped Spectre.Console 0.54.0 → 0.55.0, GitHub.Copilot.SDK 0.2.0 → 0.2.1, Markdig 1.1.1 → 1.1.2 (Dependabot PRs #10, #11, #12).
spectra-updateSKILL (10th bundled SKILL) for test update workflow via Copilot Chat- Agent delegation tables updated for update command routing
UpdateResultextended withtotalTests,testsFlagged,flaggedTests,duration,successfields
- Dashboard test file path resolution & version bump
- Criteria coverage, index generation, dashboard SKILL & CI pragma fixes
- Criteria generation, polling, progress caching & CI build fixes
- Refined SPECTRA agents/skills CLI workflows
- Coverage semantics fix & criteria-generation pipeline (spec 028)
TestCaseParsernow propagatesCriteriafield from frontmatter toTestCase- Criteria loading wired into
GenerateHandlerfor per-doc.criteria.yamlcontext
- Docs index SKILL integration, progress page, coverage fix & terminology rename (spec 024)
spectra-docsSKILL (9th bundled SKILL) with structured tool-call-sequence--skip-criteriaflag for docs index command
- Dashboard coverage null-crash fix with zero-state defaults
Spec entries that predate the semver [N.N.N] release format. Listed newest-first by spec number.
- 039-unify-critic-providers — critic provider list aligned with generator (canonical 5: github-models, azure-openai, azure-anthropic, openai, anthropic).
github→github-modelslegacy alias with stderr warning;googleis now hard error. NewDefaultProviderconstant; case-insensitive resolution. - 038-testimize-integration — optional Testimize.MCP.Server integration for algorithmic test data optimization (BVA/EP/pairwise/ABC). Disabled by default. New
testimizeconfig section,TestimizeMcpClient, two AI tools,spectra testimize checkcommand. Graceful degradation everywhere. - 037-istqb-test-techniques — ISTQB black-box techniques (EP, BVA, DT, ST, EG, UC) added to all 5 prompt templates. New
IdentifiedBehavior.Techniquefield,BehaviorAnalysisResult.TechniqueBreakdownmap,AcceptanceCriterion.TechniqueHint. Technique breakdown rendered in analysis presenter and progress page. - 034-github-pages-docs — GitHub Pages docs site (Just the Docs theme) at
automatetheplanet.github.io/Spectra/. Auto-deploys on push to main via.github/workflows/docs.yml. - 033-from-description-chat-flow — doc-aware
--from-descriptionflow.UserDescribedGenerator.BuildPrompt()testable; best-effort loads matching docs (cap 3×8000 chars) + criteria; resulting tests getsource_refs/criteria(verdict staysmanual). Intent routing in SKILL/agent. - 027-skill-agent-dedup — agents delegate CLI tasks to SKILL files (execution agent ~400 → 120 lines, generation agent ~219 → 81). Standardized "Step N" format and
--no-interaction --output-format json --verbosity quietflags. - 026-criteria-folder-coverage-fix — renamed
docs/requirements/→docs/criteria/with auto-migration inAnalyzeHandler. Skip_index.*/.criteria.yaml/_criteria_index.yamlfrom criteria extraction. Dashboard usesAcceptanceCriteriaCoverageAnalyzer. - 025-universal-skill-progress — shared
ProgressManager+ProgressPhases. All 9 SKILL-wrapped commands write.spectra-result.json; 6 long-running ones write.spectra-progress.htmlwith phase stepper. RenamedRequirements→AcceptanceCriteria. - 023-criteria-extraction-overhaul — "requirements" → "acceptance criteria" rename. Per-document iterative extraction → individual
.criteria.yaml+_criteria_index.yaml. SHA-256 incremental. Import from YAML/CSV/JSON with Jira/ADO column auto-detection, AI splitting, RFC 2119 normalization.--merge/--replace/--list-criteriamodes.spectra-criteriaSKILL. - 023-copilot-chat-integration — full VS Code Copilot Chat integration.
--suiteflag,--analyze-onlytwo-step flow, automatic batch generation (groups of 30),.spectra-result.jsonlifecycle,BehaviorAnalyzertimeout 30s → 2min, 7 bundled SKILLs. - 022-bundled-skills — bundled SKILL files + 2 agent prompts created by
spectra initin.github/skills/and.github/agents/. SHA-256 hash tracking viaSkillsManifest. Newspectra update-skillscommand,--skip-skillsflag. - 021-generation-session — four-phase generation flow (Analysis → Generation → Suggestions → User-Described).
.spectra/session.json(1h TTL). NewUserDescribedGenerator,DuplicateDetector(Levenshtein),SuggestionPresenter. CLI:--from-suggestions,--from-description,--context,--auto-complete. - 020-cli-non-interactive — global
--output-format human|jsonand--no-interactionon all commands. TypedResults/models per command,JsonResultWriter. Standardized exit codes (0/1/2/3). - 017-mcp-tool-resilience — MCP
run_id/test_handleauto-resolve when only one active run/test exists. Newlist_active_runs,cancel_all_active_runstools. Enhancedget_run_historywith status filter + summary counts. - 015-auto-requirements-extraction — AI extraction of testable requirements via Copilot SDK.
RequirementsWriter(YAML merge, duplicate detection, sequentialREQ-NNN). CLI:spectra ai analyze --extract-requirements. - 014-open-source-ready — README redesign, CI pipeline, NuGet publish pipeline, GitHub issue/PR templates, Dependabot.
- 013-cli-ux-improvements —
NextStepHintsafter every command. Init prompts for automation dirs + critic model. Newspectra config add/remove/list-automation-dirsubcommands. Interactive generation continuation menu. - 012-dashboard-branding — dashboard branding (company name, logo, favicon, light/dark theme, CSS variable overrides).
BrandingInjector+SampleDataFactory.dashboard --previewmode. - 011-coverage-overhaul — unified three-type coverage (Documentation, Requirements, Automation). New analyzers/services/models. CLI
--auto-linkwritesautomated_byback into test files. - 010-smart-test-selection — cross-suite test search via
find_test_casesMCP tool.start_execution_runacceptstest_idsandselectionmodes. Newget_test_execution_history,list_saved_selectionstools. - 010-document-index — persistent
docs/_index.mdwith per-doc metadata + SHA-256 incremental updates.spectra docs index [--force]. Auto-refresh beforeai generate.GetDocumentMapToolprefers index when available. - 009-copilot-sdk-consolidation — unified all AI under GitHub Copilot SDK as the sole runtime. Removed legacy per-provider agent/critic implementations (OpenAiAgent, AnthropicAgent, GitHubModelsAgent, MockAgent, GoogleCritic, OpenAiCritic, AnthropicCritic, GitHubCritic).
- 008-grounding-verification — dual-model critic flow (generator + verifier), three verdicts (grounded/partial/hallucinated), grounding metadata in YAML frontmatter, configurable critic provider,
--skip-criticflag. - 007-execution-agent-mcp-tools — added MCP tools for execution agents.
- 006-conversational-generation — two-mode generation (Direct/Interactive), test updates with classification (UP_TO_DATE, OUTDATED, ORPHANED, REDUNDANT), Spectre.Console UX.
- 004-test-generation-profile — profile support for test generation settings.