refactor(config): fix LLM timeout architecture by cpcloud · Pull Request #746 · micasa-dev/micasa

cpcloud · 2026-03-10T12:20:48Z

Summary

Closes #732.

Clarify timeout semantics: ResolvedLLM.Timeout is now the inference context deadline (was the HTTP client timeout). Per-pipeline overrides (llm.chat.timeout, llm.extraction.timeout) inherit from the base llm.timeout.
Derive HTTP client timeout: NewClient computes max(timeout, QuickOpTimeout) so quick ops (ping, model listing) aren't killed by short inference timeouts. QuickOpTimeout stays as a non-configurable 30s constant.
Add chat inference deadline: Chat streaming now uses context.WithTimeout (was WithCancel with no deadline), enforcing the configured llm.chat.timeout.
Deprecate extraction.llm_timeout: Migrated to llm.extraction.timeout with TOML key migration and env var rename (MICASA_EXTRACTION_LLM_TIMEOUT -> MICASA_LLM_EXTRACTION_TIMEOUT).
Remove redundant LLMInferenceTimeout: extractionConfig.Timeout now serves as the inference deadline directly, eliminating the extra field.

🤖 Generated with Claude Code

codecov · 2026-03-10T12:23:11Z

Codecov Report

❌ Patch coverage is 61.22449% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.61%. Comparing base (847153e) to head (0422e3d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/app/types.go	0.00%	10 Missing ⚠️
internal/app/chat.go	30.00%	6 Missing and 1 partial ⚠️
internal/app/extraction.go	50.00%	2 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
cmd/micasa/main.go	`3.33% <ø> (+0.01%)`	⬆️
internal/app/model.go	`62.36% <100.00%> (-0.02%)`	⬇️
internal/config/config.go	`88.02% <100.00%> (-1.62%)`	⬇️
internal/config/show.go	`81.69% <ø> (ø)`
internal/llm/client.go	`91.40% <100.00%> (+0.05%)`	⬆️
internal/app/extraction.go	`70.71% <50.00%> (-0.05%)`	⬇️
internal/app/chat.go	`72.34% <30.00%> (-0.13%)`	⬇️
internal/app/types.go	`52.94% <0.00%> (+0.61%)`	⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR refactors the LLM timeout architecture (issue #732) to cleanly separate two concerns: quick-operation timeouts (ping, model listing) and per-pipeline inference timeouts (chat streaming, extraction). Previously, llm.timeout served double duty as both an HTTP client timeout and an inference deadline, which conflated two different concerns and created confusing interactions.

Changes:

Clarified ResolvedLLM.Timeout semantics: now represents the inference context deadline; HTTP client timeout is derived as max(timeout, QuickOpTimeout) in NewClient
Added chat inference deadline: chat streaming now uses context.WithTimeout instead of context.WithCancel, enforcing the configured llm.chat.timeout
Deprecated extraction.llm_timeout: migrated to llm.extraction.timeout with TOML key migration, env var rename, and documentation update; removed the now-redundant LLMInferenceTimeout field

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`plans/llm-timeout-architecture.md`	New planning document describing the problem, design, and changes for the timeout refactor
`internal/llm/client.go`	`NewClient` now derives HTTP client timeout as `max(timeout, QuickOpTimeout)`; updated doc comment and timeout error message
`internal/config/config.go`	Clarified `LLM.Timeout` and `ResolvedLLM.Timeout` doc comments; added `extraction.llm_timeout` TOML migration and `MICASA_EXTRACTION_LLM_TIMEOUT` env var rename entry
`internal/app/types.go`	Removed `LLMInferenceTimeout` from `extractionConfig`; `Timeout` field (now annotated as inference context deadline) absorbs its role; updated `SetExtraction` signature
`internal/app/model.go`	Removed `llmInferenceTimeout` initialization (field no longer exists)
`internal/app/extraction.go`	`llmExtractCmd` now reads `extractionTimeout` directly; `llmPingCmd` uses `client.Timeout()` for the quick-op deadline
`internal/app/chat.go`	All three chat stream paths now use `context.WithTimeout` via new `chatInferenceTimeout()` helper
`internal/app/extraction_test.go`	Updated test assertion from old `extraction.llm_timeout` key name to new `llm.extraction.timeout`
`cmd/micasa/main.go`	Removed the now-redundant `cfg.Extraction.LLMTimeoutDuration()` argument from `SetExtraction` call
`docs/content/docs/reference/configuration.md`	Updated all references from old to new config keys; added deprecated env vars table entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Separate timeout concerns that were previously conflated: - ResolvedLLM.Timeout is now the inference context deadline (was HTTP client timeout). Per-pipeline overrides (llm.chat.timeout, llm.extraction.timeout) inherit from the base llm.timeout. - HTTP client timeout is derived as max(timeout, QuickOpTimeout) inside NewClient, ensuring quick ops aren't killed by short inference timeouts. - Chat streaming now enforces a context deadline via WithTimeout (was WithCancel with no deadline at all). - extraction.llm_timeout is deprecated in favor of llm.extraction.timeout with TOML key migration and env var rename. - extractionConfig.LLMInferenceTimeout removed; Timeout field serves as the inference deadline directly, eliminating the redundant layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add extraction.llm_timeout to deprecatedPaths map in show.go - Replace magic 5*time.Minute with config.DefaultLLMTimeout in chatInferenceTimeout() - Add TOML and env var migration tests for extraction.llm_timeout -> llm.extraction.timeout following existing patterns Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

## Summary Closes #732. - **Clarify timeout semantics**: `ResolvedLLM.Timeout` is now the inference context deadline (was the HTTP client timeout). Per-pipeline overrides (`llm.chat.timeout`, `llm.extraction.timeout`) inherit from the base `llm.timeout`. - **Derive HTTP client timeout**: `NewClient` computes `max(timeout, QuickOpTimeout)` so quick ops (ping, model listing) aren't killed by short inference timeouts. `QuickOpTimeout` stays as a non-configurable 30s constant. - **Add chat inference deadline**: Chat streaming now uses `context.WithTimeout` (was `WithCancel` with no deadline), enforcing the configured `llm.chat.timeout`. - **Deprecate `extraction.llm_timeout`**: Migrated to `llm.extraction.timeout` with TOML key migration and env var rename (`MICASA_EXTRACTION_LLM_TIMEOUT` -> `MICASA_LLM_EXTRACTION_TIMEOUT`). - **Remove redundant `LLMInferenceTimeout`**: `extractionConfig.Timeout` now serves as the inference deadline directly, eliminating the extra field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 10, 2026 12:20

Copilot started reviewing on behalf of cpcloud March 10, 2026 12:21 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread internal/config/config.go

Comment thread internal/app/chat.go

Comment thread internal/config/config.go

cpcloud and others added 2 commits March 10, 2026 08:43

cpcloud force-pushed the worktree-silly-noodling-wombat branch from 38c2585 to 0422e3d Compare March 10, 2026 12:46

cpcloud merged commit bd6f3dc into main Mar 10, 2026
25 checks passed

cpcloud deleted the worktree-silly-noodling-wombat branch March 10, 2026 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(config): fix LLM timeout architecture#746

refactor(config): fix LLM timeout architecture#746
cpcloud merged 2 commits intomainfrom
worktree-silly-noodling-wombat

cpcloud commented Mar 10, 2026

Uh oh!

codecov Bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cpcloud commented Mar 10, 2026

Summary

Uh oh!

codecov Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Mar 10, 2026 •

edited

Loading