Skip to content

Commit bd6f3dc

Browse files
cpcloudclaude
andauthored
refactor(config): fix LLM timeout architecture (#746)
## Summary Closes #732. - **Clarify timeout semantics**: `ResolvedLLM.Timeout` is now the inference context deadline (was the HTTP client timeout). Per-pipeline overrides (`llm.chat.timeout`, `llm.extraction.timeout`) inherit from the base `llm.timeout`. - **Derive HTTP client timeout**: `NewClient` computes `max(timeout, QuickOpTimeout)` so quick ops (ping, model listing) aren't killed by short inference timeouts. `QuickOpTimeout` stays as a non-configurable 30s constant. - **Add chat inference deadline**: Chat streaming now uses `context.WithTimeout` (was `WithCancel` with no deadline), enforcing the configured `llm.chat.timeout`. - **Deprecate `extraction.llm_timeout`**: Migrated to `llm.extraction.timeout` with TOML key migration and env var rename (`MICASA_EXTRACTION_LLM_TIMEOUT` -> `MICASA_LLM_EXTRACTION_TIMEOUT`). - **Remove redundant `LLMInferenceTimeout`**: `extractionConfig.Timeout` now serves as the inference deadline directly, eliminating the extra field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 847153e commit bd6f3dc

12 files changed

Lines changed: 252 additions & 76 deletions

File tree

cmd/micasa/main.go

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,6 @@ func (cmd *runCmd) Run() error {
173173
exCfg.Thinking,
174174
extractors,
175175
cfg.Extraction.IsEnabled(),
176-
cfg.Extraction.LLMTimeoutDuration(),
177176
cfg.Extraction.IsOCRTSV(),
178177
cfg.Extraction.OCRConfThreshold(),
179178
)

docs/content/docs/reference/configuration.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ You can always infer the env var name from the config key.
111111
| `MICASA_LLM_MODEL` | `qwen3` | `llm.model` | LLM model name |
112112
| `MICASA_LLM_API_KEY` | (empty) | `llm.api_key` | LLM API key for cloud providers |
113113
| `MICASA_LLM_EXTRA_CONTEXT` | (empty) | `llm.extra_context` | Custom context appended to LLM system prompts |
114-
| `MICASA_LLM_TIMEOUT` | `5m` | `llm.timeout` | Max time for a single LLM response |
114+
| `MICASA_LLM_TIMEOUT` | `5m` | `llm.timeout` | Base inference timeout for LLM responses |
115115
| `MICASA_LLM_THINKING` | (unset) | `llm.thinking` | Enable model thinking for chat |
116116
| `MICASA_DOCUMENTS_MAX_FILE_SIZE` | `50 MiB` | `documents.max_file_size` | Max document import size |
117117
| `MICASA_DOCUMENTS_CACHE_TTL` | `30d` | `documents.cache_ttl` | Document cache lifetime |
@@ -121,7 +121,7 @@ You can always infer the env var name from the config key.
121121
| `MICASA_EXTRACTION_ENABLE` | `true` | `extraction.enable` | Enable/disable LLM extraction |
122122
| `MICASA_EXTRACTION_THINKING` | `false` | `extraction.thinking` | Enable model thinking for extraction |
123123
| `MICASA_EXTRACTION_MAX_PAGES` | `0` | `extraction.max_pages` | Max pages to OCR per document (0 = no limit) |
124-
| `MICASA_EXTRACTION_LLM_TIMEOUT` | `5m` | `extraction.llm_timeout` | LLM extraction timeout |
124+
| `MICASA_LLM_EXTRACTION_TIMEOUT` | `5m` | `llm.extraction.timeout` | Extraction inference timeout |
125125
| `MICASA_EXTRACTION_OCR_ENABLE` | `true` | `extraction.ocr.enable` | Enable/disable OCR on documents |
126126
| `MICASA_EXTRACTION_OCR_CONFIDENCE_THRESHOLD` | `0` | `extraction.ocr.confidence_threshold` | Min tesseract confidence (0-100) |
127127
| `MICASA_LOCALE_CURRENCY` | (auto-detect) | `locale.currency` | ISO 4217 currency code (e.g. `USD`, `EUR`, `GBP`) |
@@ -144,6 +144,7 @@ warning. They will be removed in a future release.
144144
| `MICASA_EXTRACTION_ENABLED` | `MICASA_EXTRACTION_ENABLE` |
145145
| `MICASA_EXTRACTION_MODEL` | `MICASA_LLM_EXTRACTION_MODEL` |
146146
| `MICASA_EXTRACTION_THINKING` | `MICASA_LLM_EXTRACTION_THINKING` |
147+
| `MICASA_EXTRACTION_LLM_TIMEOUT` | `MICASA_LLM_EXTRACTION_TIMEOUT` |
147148

148149
{{% /details %}}
149150

@@ -279,9 +280,9 @@ model = "qwen3"
279280
# Use this to inject domain-specific details about your house, region, etc.
280281
# extra_context = "My house is a 1920s craftsman in Portland, OR."
281282

282-
# Max time for a single LLM response (including streaming).
283+
# Base inference timeout for LLM responses (including streaming).
284+
# Per-pipeline overrides: llm.chat.timeout and llm.extraction.timeout.
283285
# Go duration syntax: "5m", "10m", etc. Default: "5m".
284-
# Increase for slow models or complex queries.
285286
# timeout = "5m"
286287

287288
# Enable model thinking mode for chat (e.g. qwen3 <think> blocks).
@@ -335,7 +336,7 @@ set in `[llm.chat]` and `[llm.extraction]`.
335336
| `model` | string | `qwen3` | Model identifier sent in chat requests. Must be available on the server. |
336337
| `api_key` | string | (empty) | Authentication credential. Required for cloud providers (Anthropic, OpenAI, etc.). Leave empty for local servers. |
337338
| `extra_context` | string | (empty) | Free-form text appended to all LLM system prompts. Useful for telling the model about your house or regional conventions. Currency is handled automatically via `[locale]`. |
338-
| `timeout` | string | `"5m"` | Max time for a single LLM response (including streaming). Go duration syntax, e.g. `"10m"`. Increase for slow models. |
339+
| `timeout` | string | `"5m"` | Base inference timeout for LLM responses (including streaming). Per-pipeline overrides: `llm.chat.timeout` and `llm.extraction.timeout`. Go duration syntax, e.g. `"10m"`. |
339340
| `thinking` | bool | (unset) | Enable model thinking mode (e.g. qwen3 `<think>` blocks). Unset = don't send the option (server default). |
340341

341342
### `[llm.chat]` section
@@ -350,22 +351,23 @@ than the default.
350351
| `base_url` | string | (inherits) | Override API base URL for chat. |
351352
| `model` | string | (inherits) | Override model for chat. |
352353
| `api_key` | string | (inherits) | Override API key for chat. |
353-
| `timeout` | string | (inherits) | Override timeout for chat. |
354+
| `timeout` | string | (inherits) | Chat inference context deadline. Inherits from `llm.timeout` when not set. |
354355
| `thinking` | string | (inherits) | Override thinking mode for chat. |
355356

356357
### `[llm.extraction]` section
357358

358359
Per-pipeline LLM overrides for document extraction. Empty fields inherit
359360
from `[llm]`. Use this to run extraction on a smaller, faster model while
360-
keeping a more capable model for chat.
361+
keeping a more capable model for chat. The `timeout` field replaces the
362+
deprecated `extraction.llm_timeout`.
361363

362364
| Key | Type | Default | Description |
363365
|-----|------|---------|-------------|
364366
| `provider` | string | (inherits) | Override LLM provider for extraction. |
365367
| `base_url` | string | (inherits) | Override API base URL for extraction. |
366368
| `model` | string | (inherits) | Override model for extraction. |
367369
| `api_key` | string | (inherits) | Override API key for extraction. |
368-
| `timeout` | string | (inherits) | Override timeout for extraction. |
370+
| `timeout` | string | (inherits) | Extraction inference context deadline. Replaces `extraction.llm_timeout`. Inherits from `llm.timeout` when not set. |
369371
| `thinking` | string | (inherits) | Override thinking mode for extraction. |
370372

371373
### `[documents]` section
@@ -391,6 +393,7 @@ dates, vendor matching) from uploaded documents.
391393
| `max_pages` | int | `0` | Maximum pages to OCR per scanned document. 0 means no limit. |
392394
| `enable` | bool | `true` | Set to `false` to disable LLM-powered structured extraction. OCR and pdftotext still run (see `[extraction.ocr]`). |
393395
| `enabled` | bool | -- | **Deprecated.** Use `enable` instead. |
396+
| `llm_timeout` | string | `"5m"` | **Deprecated.** Use `[llm.extraction] timeout` instead. |
394397
| `thinking` | bool | `false` | **Deprecated.** Use `[llm.extraction] thinking` instead. |
395398

396399
### `[extraction.ocr]` section

internal/app/chat.go

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import (
1515
tea "github.com/charmbracelet/bubbletea"
1616
"github.com/charmbracelet/lipgloss"
1717
"github.com/charmbracelet/x/ansi"
18+
"github.com/cpcloud/micasa/internal/config"
1819
"github.com/cpcloud/micasa/internal/data"
1920
"github.com/cpcloud/micasa/internal/llm"
2021
ollamaPull "github.com/cpcloud/micasa/internal/ollama"
@@ -318,11 +319,20 @@ func (m *Model) submitChat() tea.Cmd {
318319
return tea.Batch(m.startSQLStream(query), m.chat.Spinner.Tick)
319320
}
320321

322+
// chatInferenceTimeout returns the configured chat inference timeout.
323+
func (m *Model) chatInferenceTimeout() time.Duration {
324+
if m.llmConfig != nil && m.llmConfig.Timeout > 0 {
325+
return m.llmConfig.Timeout
326+
}
327+
return config.DefaultLLMTimeout
328+
}
329+
321330
// startSQLStream initiates streaming SQL generation (stage 1).
322331
func (m *Model) startSQLStream(query string) tea.Cmd {
323332
client := m.llmClient
324333
store := m.store
325334
extraContext := m.llmExtraContext
335+
chatTimeout := m.chatInferenceTimeout()
326336
// Capture conversation history on the main goroutine before the closure
327337
// runs in a background goroutine -- m.chat.Messages is mutated by the
328338
// Bubble Tea event loop and is not safe to read concurrently.
@@ -346,8 +356,8 @@ func (m *Model) startSQLStream(query string) tea.Cmd {
346356
messages = append(messages, llm.Message{Role: roleUser, Content: query})
347357

348358
//nolint:gosec // cancel stored in CancelFn, called on ctrl+c
349-
ctx, cancel := context.WithCancel(
350-
context.Background(),
359+
ctx, cancel := context.WithTimeout(
360+
context.Background(), chatTimeout,
351361
)
352362
streamCh, err := client.ChatStream(ctx, messages)
353363
if err != nil {
@@ -763,7 +773,7 @@ func (m *Model) handleSQLResult(msg sqlResultMsg) tea.Cmd {
763773
{Role: roleUser, Content: "Summarize these results."},
764774
}
765775

766-
ctx, cancel := context.WithCancel(context.Background())
776+
ctx, cancel := context.WithTimeout(context.Background(), m.chatInferenceTimeout())
767777
ch, err := m.llmClient.ChatStream(ctx, messages)
768778
if err != nil {
769779
cancel()
@@ -788,7 +798,7 @@ func (m *Model) handleSQLResult(msg sqlResultMsg) tea.Cmd {
788798
func (m *Model) startFallbackStream(question string) tea.Cmd {
789799
messages := m.buildFallbackMessages(question)
790800

791-
ctx, cancel := context.WithCancel(context.Background())
801+
ctx, cancel := context.WithTimeout(context.Background(), m.chatInferenceTimeout())
792802
ch, err := m.llmClient.ChatStream(ctx, messages)
793803
if err != nil {
794804
cancel()

internal/app/extraction.go

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -492,8 +492,9 @@ func (m *Model) llmPingCmd(state *extractionLogState) tea.Cmd {
492492
return nil
493493
}
494494
id := state.ID
495+
quickOpTimeout := client.Timeout()
495496
return func() tea.Msg {
496-
ctx, cancel := context.WithTimeout(context.Background(), llm.QuickOpTimeout)
497+
ctx, cancel := context.WithTimeout(context.Background(), quickOpTimeout)
497498
defer cancel()
498499
err := client.Ping(ctx)
499500
return extractionLLMPingMsg{ID: id, Err: err}
@@ -508,7 +509,7 @@ func (m *Model) llmExtractCmd(ctx context.Context, ex *extractionLogState) tea.C
508509
}
509510
schemaCtx := m.buildSchemaContext()
510511
id := ex.ID
511-
timeout := m.ex.llmInferenceTimeout
512+
timeout := m.ex.extractionTimeout
512513
return func() tea.Msg {
513514
llmCtx := ctx
514515
if timeout > 0 {
@@ -743,7 +744,7 @@ func (m *Model) handleExtractionLLMChunk(msg extractionLLMChunkMsg) tea.Cmd {
743744
errMsg := msg.Err.Error()
744745
if errors.Is(msg.Err, context.DeadlineExceeded) {
745746
errMsg = fmt.Sprintf(
746-
"timed out after %s -- increase extraction.llm_timeout in config",
747+
"timed out after %s -- increase llm.extraction.timeout in config",
747748
step.Elapsed.Truncate(time.Second),
748749
)
749750
}

internal/app/extraction_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1016,7 +1016,7 @@ func TestLLMExtraction_TimeoutError(t *testing.T) {
10161016
assert.Equal(t, stepFailed, step.Status)
10171017
require.NotEmpty(t, step.Logs)
10181018
assert.Contains(t, step.Logs[0], "timed out")
1019-
assert.Contains(t, step.Logs[0], "extraction.llm_timeout")
1019+
assert.Contains(t, step.Logs[0], "llm.extraction.timeout")
10201020
}
10211021

10221022
func TestLLMExtraction_TimeoutError_NonDeadlinePreservesOriginal(t *testing.T) {

internal/app/model.go

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -266,17 +266,16 @@ func NewModel(store *data.Store, options Options) (*Model, error) {
266266
llmExtraContext: extraContext,
267267
filePickerDir: options.FilePickerDir,
268268
ex: extractState{
269-
extractionProvider: options.ExtractionConfig.Provider,
270-
extractionBaseURL: options.ExtractionConfig.BaseURL,
271-
extractionModel: options.ExtractionConfig.Model,
272-
extractionAPIKey: options.ExtractionConfig.APIKey,
273-
extractionTimeout: options.ExtractionConfig.Timeout,
274-
extractionThinking: options.ExtractionConfig.Thinking,
275-
extractionEnabled: options.ExtractionConfig.Enabled,
276-
ocrTSV: options.ExtractionConfig.OCRTSV,
277-
ocrConfThreshold: options.ExtractionConfig.OCRConfThreshold,
278-
extractors: options.ExtractionConfig.Extractors,
279-
llmInferenceTimeout: options.ExtractionConfig.LLMInferenceTimeout,
269+
extractionProvider: options.ExtractionConfig.Provider,
270+
extractionBaseURL: options.ExtractionConfig.BaseURL,
271+
extractionModel: options.ExtractionConfig.Model,
272+
extractionAPIKey: options.ExtractionConfig.APIKey,
273+
extractionTimeout: options.ExtractionConfig.Timeout,
274+
extractionThinking: options.ExtractionConfig.Thinking,
275+
extractionEnabled: options.ExtractionConfig.Enabled,
276+
ocrTSV: options.ExtractionConfig.OCRTSV,
277+
ocrConfThreshold: options.ExtractionConfig.OCRConfThreshold,
278+
extractors: options.ExtractionConfig.Extractors,
280279
},
281280
pull: pullState{progress: pprog},
282281
styles: appStyles,

internal/app/types.go

Lines changed: 30 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -87,19 +87,18 @@ func (fs *formState) formKind() FormKind {
8787
type extractState struct {
8888
// Extraction-specific LLM connection settings. When extractionProvider
8989
// differs from the chat provider, an independent client is created.
90-
extractionProvider string
91-
extractionBaseURL string
92-
extractionModel string
93-
extractionAPIKey string
94-
extractionTimeout time.Duration
95-
extractionThinking string
96-
extractionEnabled bool
97-
ocrTSV bool
98-
ocrConfThreshold int
99-
extractionClient *llm.Client
100-
extractors []extract.Extractor
101-
extractionReady bool
102-
llmInferenceTimeout time.Duration
90+
extractionProvider string
91+
extractionBaseURL string
92+
extractionModel string
93+
extractionAPIKey string
94+
extractionTimeout time.Duration // inference context deadline
95+
extractionThinking string
96+
extractionEnabled bool
97+
ocrTSV bool
98+
ocrConfThreshold int
99+
extractionClient *llm.Client
100+
extractors []extract.Extractor
101+
extractionReady bool
103102

104103
pendingExtractionDocID *uint
105104
extraction *extractionLogState
@@ -277,22 +276,21 @@ type llmConfig struct {
277276
Model string
278277
APIKey string //nolint:gosec // G101 false positive: field name triggers heuristic, not a hardcoded credential
279278
ExtraContext string
280-
Timeout time.Duration
281-
Thinking string // reasoning effort: none|low|medium|high|auto
279+
Timeout time.Duration // inference context deadline
280+
Thinking string // reasoning effort: none|low|medium|high|auto
282281
}
283282

284283
// extractionConfig holds resolved extraction pipeline settings.
285284
type extractionConfig struct {
286285
// LLM connection settings for extraction. When Provider is non-empty,
287286
// the extraction pipeline creates its own LLM client independent of
288287
// the chat client. When empty, falls back to the chat client.
289-
Provider string
290-
BaseURL string
291-
Model string
292-
APIKey string //nolint:gosec // G117 false positive: field name, not a hardcoded credential
293-
Timeout time.Duration
294-
Thinking string // reasoning effort level
295-
LLMInferenceTimeout time.Duration
288+
Provider string
289+
BaseURL string
290+
Model string
291+
APIKey string //nolint:gosec // G117 false positive: field name, not a hardcoded credential
292+
Timeout time.Duration // inference context deadline
293+
Thinking string // reasoning effort level
296294

297295
Extractors []extract.Extractor // configured extractors; nil = defaults
298296
Enabled bool // LLM extraction enabled
@@ -307,22 +305,20 @@ func (o *Options) SetExtraction(
307305
thinking string,
308306
extractors []extract.Extractor,
309307
enabled bool,
310-
llmInferenceTimeout time.Duration,
311308
ocrTSV bool,
312309
ocrConfThreshold int,
313310
) {
314311
o.ExtractionConfig = extractionConfig{
315-
Provider: provider,
316-
BaseURL: baseURL,
317-
Model: model,
318-
APIKey: apiKey,
319-
Timeout: timeout,
320-
Thinking: thinking,
321-
LLMInferenceTimeout: llmInferenceTimeout,
322-
Extractors: extractors,
323-
Enabled: enabled,
324-
OCRTSV: ocrTSV,
325-
OCRConfThreshold: ocrConfThreshold,
312+
Provider: provider,
313+
BaseURL: baseURL,
314+
Model: model,
315+
APIKey: apiKey,
316+
Timeout: timeout,
317+
Thinking: thinking,
318+
Extractors: extractors,
319+
Enabled: enabled,
320+
OCRTSV: ocrTSV,
321+
OCRConfThreshold: ocrConfThreshold,
326322
}
327323
}
328324

internal/config/config.go

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,9 @@ type LLM struct {
6666
// Currency is handled by [locale] section. Optional; defaults to empty.
6767
ExtraContext string `toml:"extra_context"`
6868

69-
// Timeout is the maximum time for a single LLM response (including
69+
// Timeout is the base inference timeout for LLM responses (including
7070
// streaming). Go duration string, e.g. "5m", "10m". Default: "5m".
71-
// Quick operations (ping, model listing) use a shorter fixed deadline.
71+
// Per-pipeline overrides: llm.chat.timeout and llm.extraction.timeout.
7272
Timeout string `toml:"timeout" default:"5m"`
7373

7474
// Thinking controls the model's reasoning effort level. Supported values:
@@ -114,7 +114,7 @@ type ResolvedLLM struct {
114114
Model string
115115
APIKey string //nolint:gosec // resolved config field, not a hardcoded credential
116116
ExtraContext string
117-
Timeout time.Duration
117+
Timeout time.Duration // inference context deadline for this pipeline
118118
Thinking string
119119
}
120120

@@ -965,12 +965,23 @@ func migrateRenamedKeys(cfg *Config, md toml.MetaData, path string) {
965965
"extraction.thinking is deprecated -- use llm.extraction.thinking instead",
966966
)
967967
}
968+
969+
// extraction.llm_timeout -> llm.extraction.timeout (v1.80)
970+
if md.IsDefined("extraction", "llm_timeout") && !md.IsDefined("llm", "extraction", "timeout") {
971+
cfg.LLM.Extraction.Timeout = cfg.Extraction.LLMTimeout
972+
cfg.Warnings = append(cfg.Warnings,
973+
"extraction.llm_timeout is deprecated -- use llm.extraction.timeout instead",
974+
)
975+
}
968976
}
969977

970978
// envRenames maps deprecated environment variable names to their canonical
971979
// replacements. Processed newest-first so that the most recent intermediate
972980
// name wins when multiple generations of the same variable are set.
973981
var envRenames = []struct{ old, canonical string }{
982+
// v1.80: extraction.llm_timeout -> llm.extraction.timeout.
983+
{"MICASA_EXTRACTION_LLM_TIMEOUT", "MICASA_LLM_EXTRACTION_TIMEOUT"},
984+
974985
// v1.78: extraction.enabled -> extraction.enable.
975986
{"MICASA_EXTRACTION_ENABLED", "MICASA_EXTRACTION_ENABLE"},
976987

@@ -1105,9 +1116,9 @@ model = "` + DefaultModel + `"
11051116
# Use this to inject domain-specific details about your house, region, etc.
11061117
# extra_context = "My house is a 1920s craftsman in Portland, OR."
11071118
1108-
# Maximum time for a single LLM response (including streaming).
1119+
# Base inference timeout for LLM responses (including streaming).
11091120
# Go duration syntax: "5m", "10m", etc. Default: "5m".
1110-
# Increase for very slow models or complex queries.
1121+
# Per-pipeline overrides: llm.chat.timeout and llm.extraction.timeout.
11111122
# timeout = "5m"
11121123
11131124
# Model reasoning effort level. Supported: none, low, medium, high, auto.
@@ -1121,7 +1132,7 @@ model = "` + DefaultModel + `"
11211132
# base_url = "https://api.anthropic.com"
11221133
# model = "claude-sonnet-4-5-20250929"
11231134
# api_key = "sk-ant-..."
1124-
# timeout = "10s"
1135+
# timeout = "5m" # inference context deadline (default: 5m)
11251136
# thinking = "medium"
11261137
11271138
# [llm.extraction]
@@ -1131,7 +1142,7 @@ model = "` + DefaultModel + `"
11311142
# base_url = "https://api.anthropic.com"
11321143
# model = "claude-haiku-3-5-20241022"
11331144
# api_key = "sk-ant-..."
1134-
# timeout = "15s"
1145+
# timeout = "5m" # inference context deadline (default: 5m)
11351146
# thinking = "low"
11361147
11371148
[documents]
@@ -1153,8 +1164,7 @@ model = "` + DefaultModel + `"
11531164
# still run (see [extraction.ocr]) to populate document text for search/display.
11541165
# enable = true
11551166
1156-
# Timeout for LLM extraction inference. Go duration syntax: "5m", "90s", etc.
1157-
# Default: "5m". Increase for slow local models or complex documents.
1167+
# Deprecated: use [llm.extraction] timeout instead.
11581168
# llm_timeout = "5m"
11591169
11601170
# Maximum pages for async extraction of scanned documents. 0 = no limit. Default: 0.

0 commit comments

Comments
 (0)