Skip to content

refactor(config): fix LLM timeout architecture#746

Merged
cpcloud merged 2 commits intomainfrom
worktree-silly-noodling-wombat
Mar 10, 2026
Merged

refactor(config): fix LLM timeout architecture#746
cpcloud merged 2 commits intomainfrom
worktree-silly-noodling-wombat

Conversation

@cpcloud
Copy link
Copy Markdown
Collaborator

@cpcloud cpcloud commented Mar 10, 2026

Summary

Closes #732.

  • Clarify timeout semantics: ResolvedLLM.Timeout is now the inference context deadline (was the HTTP client timeout). Per-pipeline overrides (llm.chat.timeout, llm.extraction.timeout) inherit from the base llm.timeout.
  • Derive HTTP client timeout: NewClient computes max(timeout, QuickOpTimeout) so quick ops (ping, model listing) aren't killed by short inference timeouts. QuickOpTimeout stays as a non-configurable 30s constant.
  • Add chat inference deadline: Chat streaming now uses context.WithTimeout (was WithCancel with no deadline), enforcing the configured llm.chat.timeout.
  • Deprecate extraction.llm_timeout: Migrated to llm.extraction.timeout with TOML key migration and env var rename (MICASA_EXTRACTION_LLM_TIMEOUT -> MICASA_LLM_EXTRACTION_TIMEOUT).
  • Remove redundant LLMInferenceTimeout: extractionConfig.Timeout now serves as the inference deadline directly, eliminating the extra field.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 10, 2026 12:20
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 61.22449% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (847153e) to head (0422e3d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/app/types.go 0.00% 10 Missing ⚠️
internal/app/chat.go 30.00% 6 Missing and 1 partial ⚠️
internal/app/extraction.go 50.00% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
cmd/micasa/main.go 3.33% <ø> (+0.01%) ⬆️
internal/app/model.go 62.36% <100.00%> (-0.02%) ⬇️
internal/config/config.go 88.02% <100.00%> (-1.62%) ⬇️
internal/config/show.go 81.69% <ø> (ø)
internal/llm/client.go 91.40% <100.00%> (+0.05%) ⬆️
internal/app/extraction.go 70.71% <50.00%> (-0.05%) ⬇️
internal/app/chat.go 72.34% <30.00%> (-0.13%) ⬇️
internal/app/types.go 52.94% <0.00%> (+0.61%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the LLM timeout architecture (issue #732) to cleanly separate two concerns: quick-operation timeouts (ping, model listing) and per-pipeline inference timeouts (chat streaming, extraction). Previously, llm.timeout served double duty as both an HTTP client timeout and an inference deadline, which conflated two different concerns and created confusing interactions.

Changes:

  • Clarified ResolvedLLM.Timeout semantics: now represents the inference context deadline; HTTP client timeout is derived as max(timeout, QuickOpTimeout) in NewClient
  • Added chat inference deadline: chat streaming now uses context.WithTimeout instead of context.WithCancel, enforcing the configured llm.chat.timeout
  • Deprecated extraction.llm_timeout: migrated to llm.extraction.timeout with TOML key migration, env var rename, and documentation update; removed the now-redundant LLMInferenceTimeout field

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
plans/llm-timeout-architecture.md New planning document describing the problem, design, and changes for the timeout refactor
internal/llm/client.go NewClient now derives HTTP client timeout as max(timeout, QuickOpTimeout); updated doc comment and timeout error message
internal/config/config.go Clarified LLM.Timeout and ResolvedLLM.Timeout doc comments; added extraction.llm_timeout TOML migration and MICASA_EXTRACTION_LLM_TIMEOUT env var rename entry
internal/app/types.go Removed LLMInferenceTimeout from extractionConfig; Timeout field (now annotated as inference context deadline) absorbs its role; updated SetExtraction signature
internal/app/model.go Removed llmInferenceTimeout initialization (field no longer exists)
internal/app/extraction.go llmExtractCmd now reads extractionTimeout directly; llmPingCmd uses client.Timeout() for the quick-op deadline
internal/app/chat.go All three chat stream paths now use context.WithTimeout via new chatInferenceTimeout() helper
internal/app/extraction_test.go Updated test assertion from old extraction.llm_timeout key name to new llm.extraction.timeout
cmd/micasa/main.go Removed the now-redundant cfg.Extraction.LLMTimeoutDuration() argument from SetExtraction call
docs/content/docs/reference/configuration.md Updated all references from old to new config keys; added deprecated env vars table entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/config/config.go
Comment thread internal/app/chat.go
Comment thread internal/config/config.go
cpcloud and others added 2 commits March 10, 2026 08:43
Separate timeout concerns that were previously conflated:

- ResolvedLLM.Timeout is now the inference context deadline (was HTTP
  client timeout). Per-pipeline overrides (llm.chat.timeout,
  llm.extraction.timeout) inherit from the base llm.timeout.

- HTTP client timeout is derived as max(timeout, QuickOpTimeout) inside
  NewClient, ensuring quick ops aren't killed by short inference timeouts.

- Chat streaming now enforces a context deadline via WithTimeout (was
  WithCancel with no deadline at all).

- extraction.llm_timeout is deprecated in favor of llm.extraction.timeout
  with TOML key migration and env var rename.

- extractionConfig.LLMInferenceTimeout removed; Timeout field serves as
  the inference deadline directly, eliminating the redundant layer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add extraction.llm_timeout to deprecatedPaths map in show.go
- Replace magic 5*time.Minute with config.DefaultLLMTimeout in
  chatInferenceTimeout()
- Add TOML and env var migration tests for extraction.llm_timeout ->
  llm.extraction.timeout following existing patterns

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the worktree-silly-noodling-wombat branch from 38c2585 to 0422e3d Compare March 10, 2026 12:46
@cpcloud cpcloud merged commit bd6f3dc into main Mar 10, 2026
25 checks passed
@cpcloud cpcloud deleted the worktree-silly-noodling-wombat branch March 10, 2026 12:57
cpcloud added a commit that referenced this pull request Mar 19, 2026
## Summary

Closes #732.

- **Clarify timeout semantics**: `ResolvedLLM.Timeout` is now the
inference context deadline (was the HTTP client timeout). Per-pipeline
overrides (`llm.chat.timeout`, `llm.extraction.timeout`) inherit from
the base `llm.timeout`.
- **Derive HTTP client timeout**: `NewClient` computes `max(timeout,
QuickOpTimeout)` so quick ops (ping, model listing) aren't killed by
short inference timeouts. `QuickOpTimeout` stays as a non-configurable
30s constant.
- **Add chat inference deadline**: Chat streaming now uses
`context.WithTimeout` (was `WithCancel` with no deadline), enforcing the
configured `llm.chat.timeout`.
- **Deprecate `extraction.llm_timeout`**: Migrated to
`llm.extraction.timeout` with TOML key migration and env var rename
(`MICASA_EXTRACTION_LLM_TIMEOUT` -> `MICASA_LLM_EXTRACTION_TIMEOUT`).
- **Remove redundant `LLMInferenceTimeout`**: `extractionConfig.Timeout`
now serves as the inference deadline directly, eliminating the extra
field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor(config): fix LLM timeout architecture

2 participants