You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
Closes#732.
- **Clarify timeout semantics**: `ResolvedLLM.Timeout` is now the
inference context deadline (was the HTTP client timeout). Per-pipeline
overrides (`llm.chat.timeout`, `llm.extraction.timeout`) inherit from
the base `llm.timeout`.
- **Derive HTTP client timeout**: `NewClient` computes `max(timeout,
QuickOpTimeout)` so quick ops (ping, model listing) aren't killed by
short inference timeouts. `QuickOpTimeout` stays as a non-configurable
30s constant.
- **Add chat inference deadline**: Chat streaming now uses
`context.WithTimeout` (was `WithCancel` with no deadline), enforcing the
configured `llm.chat.timeout`.
- **Deprecate `extraction.llm_timeout`**: Migrated to
`llm.extraction.timeout` with TOML key migration and env var rename
(`MICASA_EXTRACTION_LLM_TIMEOUT` -> `MICASA_LLM_EXTRACTION_TIMEOUT`).
- **Remove redundant `LLMInferenceTimeout`**: `extractionConfig.Timeout`
now serves as the inference deadline directly, eliminating the extra
field.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
# Use this to inject domain-specific details about your house, region, etc.
280
281
# extra_context = "My house is a 1920s craftsman in Portland, OR."
281
282
282
-
# Max time for a single LLM response (including streaming).
283
+
# Base inference timeout for LLM responses (including streaming).
284
+
# Per-pipeline overrides: llm.chat.timeout and llm.extraction.timeout.
283
285
# Go duration syntax: "5m", "10m", etc. Default: "5m".
284
-
# Increase for slow models or complex queries.
285
286
# timeout = "5m"
286
287
287
288
# Enable model thinking mode for chat (e.g. qwen3 <think> blocks).
@@ -335,7 +336,7 @@ set in `[llm.chat]` and `[llm.extraction]`.
335
336
|`model`| string |`qwen3`| Model identifier sent in chat requests. Must be available on the server. |
336
337
|`api_key`| string | (empty) | Authentication credential. Required for cloud providers (Anthropic, OpenAI, etc.). Leave empty for local servers. |
337
338
|`extra_context`| string | (empty) | Free-form text appended to all LLM system prompts. Useful for telling the model about your house or regional conventions. Currency is handled automatically via `[locale]`. |
338
-
|`timeout`| string |`"5m"`|Max time for a single LLM response (including streaming). Go duration syntax, e.g. `"10m"`. Increase for slow models. |
339
+
|`timeout`| string |`"5m"`|Base inference timeout for LLM responses (including streaming). Per-pipeline overrides: `llm.chat.timeout` and `llm.extraction.timeout`. Go duration syntax, e.g. `"10m"`. |
0 commit comments