refactor(config): fix LLM timeout architecture#746
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR refactors the LLM timeout architecture (issue #732) to cleanly separate two concerns: quick-operation timeouts (ping, model listing) and per-pipeline inference timeouts (chat streaming, extraction). Previously, llm.timeout served double duty as both an HTTP client timeout and an inference deadline, which conflated two different concerns and created confusing interactions.
Changes:
- Clarified
ResolvedLLM.Timeoutsemantics: now represents the inference context deadline; HTTP client timeout is derived asmax(timeout, QuickOpTimeout)inNewClient - Added chat inference deadline: chat streaming now uses
context.WithTimeoutinstead ofcontext.WithCancel, enforcing the configuredllm.chat.timeout - Deprecated
extraction.llm_timeout: migrated tollm.extraction.timeoutwith TOML key migration, env var rename, and documentation update; removed the now-redundantLLMInferenceTimeoutfield
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
plans/llm-timeout-architecture.md |
New planning document describing the problem, design, and changes for the timeout refactor |
internal/llm/client.go |
NewClient now derives HTTP client timeout as max(timeout, QuickOpTimeout); updated doc comment and timeout error message |
internal/config/config.go |
Clarified LLM.Timeout and ResolvedLLM.Timeout doc comments; added extraction.llm_timeout TOML migration and MICASA_EXTRACTION_LLM_TIMEOUT env var rename entry |
internal/app/types.go |
Removed LLMInferenceTimeout from extractionConfig; Timeout field (now annotated as inference context deadline) absorbs its role; updated SetExtraction signature |
internal/app/model.go |
Removed llmInferenceTimeout initialization (field no longer exists) |
internal/app/extraction.go |
llmExtractCmd now reads extractionTimeout directly; llmPingCmd uses client.Timeout() for the quick-op deadline |
internal/app/chat.go |
All three chat stream paths now use context.WithTimeout via new chatInferenceTimeout() helper |
internal/app/extraction_test.go |
Updated test assertion from old extraction.llm_timeout key name to new llm.extraction.timeout |
cmd/micasa/main.go |
Removed the now-redundant cfg.Extraction.LLMTimeoutDuration() argument from SetExtraction call |
docs/content/docs/reference/configuration.md |
Updated all references from old to new config keys; added deprecated env vars table entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Separate timeout concerns that were previously conflated: - ResolvedLLM.Timeout is now the inference context deadline (was HTTP client timeout). Per-pipeline overrides (llm.chat.timeout, llm.extraction.timeout) inherit from the base llm.timeout. - HTTP client timeout is derived as max(timeout, QuickOpTimeout) inside NewClient, ensuring quick ops aren't killed by short inference timeouts. - Chat streaming now enforces a context deadline via WithTimeout (was WithCancel with no deadline at all). - extraction.llm_timeout is deprecated in favor of llm.extraction.timeout with TOML key migration and env var rename. - extractionConfig.LLMInferenceTimeout removed; Timeout field serves as the inference deadline directly, eliminating the redundant layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add extraction.llm_timeout to deprecatedPaths map in show.go - Replace magic 5*time.Minute with config.DefaultLLMTimeout in chatInferenceTimeout() - Add TOML and env var migration tests for extraction.llm_timeout -> llm.extraction.timeout following existing patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
38c2585 to
0422e3d
Compare
## Summary Closes #732. - **Clarify timeout semantics**: `ResolvedLLM.Timeout` is now the inference context deadline (was the HTTP client timeout). Per-pipeline overrides (`llm.chat.timeout`, `llm.extraction.timeout`) inherit from the base `llm.timeout`. - **Derive HTTP client timeout**: `NewClient` computes `max(timeout, QuickOpTimeout)` so quick ops (ping, model listing) aren't killed by short inference timeouts. `QuickOpTimeout` stays as a non-configurable 30s constant. - **Add chat inference deadline**: Chat streaming now uses `context.WithTimeout` (was `WithCancel` with no deadline), enforcing the configured `llm.chat.timeout`. - **Deprecate `extraction.llm_timeout`**: Migrated to `llm.extraction.timeout` with TOML key migration and env var rename (`MICASA_EXTRACTION_LLM_TIMEOUT` -> `MICASA_LLM_EXTRACTION_TIMEOUT`). - **Remove redundant `LLMInferenceTimeout`**: `extractionConfig.Timeout` now serves as the inference deadline directly, eliminating the extra field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Closes #732.
ResolvedLLM.Timeoutis now the inference context deadline (was the HTTP client timeout). Per-pipeline overrides (llm.chat.timeout,llm.extraction.timeout) inherit from the basellm.timeout.NewClientcomputesmax(timeout, QuickOpTimeout)so quick ops (ping, model listing) aren't killed by short inference timeouts.QuickOpTimeoutstays as a non-configurable 30s constant.context.WithTimeout(wasWithCancelwith no deadline), enforcing the configuredllm.chat.timeout.extraction.llm_timeout: Migrated tollm.extraction.timeoutwith TOML key migration and env var rename (MICASA_EXTRACTION_LLM_TIMEOUT->MICASA_LLM_EXTRACTION_TIMEOUT).LLMInferenceTimeout:extractionConfig.Timeoutnow serves as the inference deadline directly, eliminating the extra field.🤖 Generated with Claude Code