Skip to content

Commit 7b9d7e5

Browse files
cpcloudclaude
andauthored
feat(config): normalize enable flags and add extraction.ocr subtable (#735)
## Summary - Rename `extraction.enabled` to `extraction.enable` with TOML key migration and env var deprecation (`MICASA_EXTRACTION_ENABLED` -> `MICASA_EXTRACTION_ENABLE`) - Add `[extraction.ocr]` subtable with `enable` (bool) and `confidence_threshold` (int) fields for independent OCR control - Wire OCR config through `DefaultExtractors` — OCR extractors conditionally included based on `extraction.ocr.enable`, low-confidence words filtered by `confidence_threshold` - Remove `text_timeout` from the config surface — pdftotext is fast, the 30s safety net stays as an internal `DefaultTextTimeout` constant - Fix `FormatDuration` to produce clean notation for whole minutes/hours (`5m` not `5m0s`) - Fix docs that incorrectly described `llm.timeout` as a 5s quick-op timeout (it is a 5m HTTP response timeout) closes #729 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 31a808c commit 7b9d7e5

15 files changed

Lines changed: 354 additions & 121 deletions

File tree

.claude/codebase/types.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,12 +120,13 @@ Col* (e.g., ColID = "id", ColName = "name", ColDeletedAt = "deleted_at")
120120
- LLM (provider, model, baseURL, apiKey, timeout, thinking, extraContext)
121121
- Chat/Extraction overrides (LLMChatOverride, LLMExtractionOverride)
122122
- Documents (MaxFileSize ByteSize, CacheTTL Duration)
123-
- Extraction (MaxPages int, Enabled *bool, TextTimeout, LLMTimeout)
123+
- Extraction (MaxPages int, Enable *bool, LLMTimeout)
124+
- OCR (Enable *bool, ConfidenceThreshold int)
124125
- Locale (Currency string)
125126

126127
### Defaults
127128
- Provider: "ollama", Model: "qwen3", BaseURL: "http://localhost:11434"
128-
- MaxPages: 20, CacheTTL: 30 days, TextTimeout: 30s, LLMTimeout: 5m
129+
- MaxPages: 20, CacheTTL: 30 days, LLMTimeout: 5m
129130

130131
## LLM Types (internal/llm/)
131132

cmd/micasa/main.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,9 @@ func (cmd *runCmd) Run() error {
161161
exCfg := cfg.LLM.ExtractionConfig()
162162
extractors := extract.DefaultExtractors(
163163
cfg.Extraction.MaxPages,
164-
cfg.Extraction.TextTimeoutDuration(),
164+
0, // pdftotext uses its own internal default timeout (30s)
165+
cfg.Extraction.IsOCREnabled(),
166+
cfg.Extraction.OCR.ConfidenceThreshold,
165167
)
166168
opts.SetExtraction(
167169
exCfg.Provider,

docs/content/docs/reference/configuration.md

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -111,18 +111,19 @@ You can always infer the env var name from the config key.
111111
| `MICASA_LLM_MODEL` | `qwen3` | `llm.model` | LLM model name |
112112
| `MICASA_LLM_API_KEY` | (empty) | `llm.api_key` | LLM API key for cloud providers |
113113
| `MICASA_LLM_EXTRA_CONTEXT` | (empty) | `llm.extra_context` | Custom context appended to LLM system prompts |
114-
| `MICASA_LLM_TIMEOUT` | `5s` | `llm.timeout` | LLM operation timeout |
114+
| `MICASA_LLM_TIMEOUT` | `5m` | `llm.timeout` | Max time for a single LLM response |
115115
| `MICASA_LLM_THINKING` | (unset) | `llm.thinking` | Enable model thinking for chat |
116116
| `MICASA_DOCUMENTS_MAX_FILE_SIZE` | `50 MiB` | `documents.max_file_size` | Max document import size |
117117
| `MICASA_DOCUMENTS_CACHE_TTL` | `30d` | `documents.cache_ttl` | Document cache lifetime |
118118
| `MICASA_DOCUMENTS_CACHE_TTL_DAYS` | -- | `documents.cache_ttl_days` | Deprecated; use `MICASA_DOCUMENTS_CACHE_TTL` |
119119
| `MICASA_DOCUMENTS_FILE_PICKER_DIR` | (Downloads) | `documents.file_picker_dir` | Starting directory for the file picker |
120120
| `MICASA_EXTRACTION_MODEL` | (chat model) | `extraction.model` | LLM model for document extraction |
121-
| `MICASA_EXTRACTION_ENABLED` | `true` | `extraction.enabled` | Enable/disable LLM extraction |
121+
| `MICASA_EXTRACTION_ENABLE` | `true` | `extraction.enable` | Enable/disable LLM extraction |
122122
| `MICASA_EXTRACTION_THINKING` | `false` | `extraction.thinking` | Enable model thinking for extraction |
123-
| `MICASA_EXTRACTION_TEXT_TIMEOUT` | `30s` | `extraction.text_timeout` | pdftotext timeout |
124123
| `MICASA_EXTRACTION_MAX_PAGES` | `0` | `extraction.max_pages` | Max pages to OCR per document (0 = no limit) |
125124
| `MICASA_EXTRACTION_LLM_TIMEOUT` | `5m` | `extraction.llm_timeout` | LLM extraction timeout |
125+
| `MICASA_EXTRACTION_OCR_ENABLE` | `true` | `extraction.ocr.enable` | Enable/disable OCR on documents |
126+
| `MICASA_EXTRACTION_OCR_CONFIDENCE_THRESHOLD` | `0` | `extraction.ocr.confidence_threshold` | Min tesseract confidence (0-100) |
126127
| `MICASA_LOCALE_CURRENCY` | (auto-detect) | `locale.currency` | ISO 4217 currency code (e.g. `USD`, `EUR`, `GBP`) |
127128

128129
{{% details title="Deprecated env var names" closed="true" %}}
@@ -139,8 +140,8 @@ warning. They will be removed in a future release.
139140
| `MICASA_CURRENCY` | `MICASA_LOCALE_CURRENCY` |
140141
| `MICASA_EXTRACTION_MAX_EXTRACT_PAGES` | `MICASA_EXTRACTION_MAX_PAGES` |
141142
| `MICASA_MAX_EXTRACT_PAGES` | `MICASA_EXTRACTION_MAX_PAGES` |
142-
| `MICASA_TEXT_TIMEOUT` | `MICASA_EXTRACTION_TEXT_TIMEOUT` |
143143
| `MICASA_MAX_OCR_PAGES` | `MICASA_EXTRACTION_MAX_PAGES` |
144+
| `MICASA_EXTRACTION_ENABLED` | `MICASA_EXTRACTION_ENABLE` |
144145
| `MICASA_EXTRACTION_MODEL` | `MICASA_LLM_EXTRACTION_MODEL` |
145146
| `MICASA_EXTRACTION_THINKING` | `MICASA_LLM_EXTRACTION_THINKING` |
146147

@@ -177,12 +178,12 @@ micasa # uses llama3.3 instead of the default qwen3
177178

178179
### `MICASA_LLM_TIMEOUT`
179180

180-
Sets the LLM timeout for quick operations (ping, model listing), overriding
181-
the config file value. Uses Go duration syntax:
181+
Sets the maximum time for a single LLM response (including streaming),
182+
overriding the config file value. Uses Go duration syntax:
182183

183184
```sh
184-
export MICASA_LLM_TIMEOUT=15s
185-
micasa # waits up to 15s for LLM server responses
185+
export MICASA_LLM_TIMEOUT=10m
186+
micasa # waits up to 10m for LLM responses
186187
```
187188

188189
### `MICASA_DOCUMENTS_MAX_FILE_SIZE`
@@ -278,10 +279,10 @@ model = "qwen3"
278279
# Use this to inject domain-specific details about your house, region, etc.
279280
# extra_context = "My house is a 1920s craftsman in Portland, OR."
280281

281-
# Timeout for quick LLM server operations (ping, model listing).
282-
# Go duration syntax: "5s", "10s", "500ms", etc. Default: "5s".
283-
# Increase if your LLM server is slow to respond.
284-
# timeout = "5s"
282+
# Max time for a single LLM response (including streaming).
283+
# Go duration syntax: "5m", "10m", etc. Default: "5m".
284+
# Increase for slow models or complex queries.
285+
# timeout = "5m"
285286

286287
# Enable model thinking mode for chat (e.g. qwen3 <think> blocks).
287288
# Unset = don't send (server default), true = enable, false = disable.
@@ -302,10 +303,6 @@ model = "qwen3"
302303
# with small, fast models optimized for structured JSON output.
303304
# model = "qwen2.5:7b"
304305

305-
# Timeout for pdftotext. Go duration syntax: "30s", "1m", etc. Default: "30s".
306-
# Increase if you routinely process very large PDFs.
307-
# text_timeout = "30s"
308-
309306
# Maximum pages to OCR for scanned documents. 0 = no limit. Default: 0.
310307
# max_pages = 0
311308

@@ -338,7 +335,7 @@ set in `[llm.chat]` and `[llm.extraction]`.
338335
| `model` | string | `qwen3` | Model identifier sent in chat requests. Must be available on the server. |
339336
| `api_key` | string | (empty) | Authentication credential. Required for cloud providers (Anthropic, OpenAI, etc.). Leave empty for local servers. |
340337
| `extra_context` | string | (empty) | Free-form text appended to all LLM system prompts. Useful for telling the model about your house or regional conventions. Currency is handled automatically via `[locale]`. |
341-
| `timeout` | string | `"5s"` | Max wait time for quick LLM operations (ping, model listing). Go duration syntax, e.g. `"10s"`, `"500ms"`. Increase for slow servers. |
338+
| `timeout` | string | `"5m"` | Max time for a single LLM response (including streaming). Go duration syntax, e.g. `"10m"`. Increase for slow models. |
342339
| `thinking` | bool | (unset) | Enable model thinking mode (e.g. qwen3 `<think>` blocks). Unset = don't send the option (server default). |
343340

344341
### `[llm.chat]` section
@@ -391,11 +388,20 @@ dates, vendor matching) from uploaded documents.
391388
| Key | Type | Default | Description |
392389
|-----|------|---------|-------------|
393390
| `model` | string | (chat model) | **Deprecated.** Use `[llm.extraction] model` instead. Falls back to `llm.model` if empty. |
394-
| `text_timeout` | string | `"30s"` | Max time for `pdftotext` to run. Go duration syntax, e.g. `"1m"`. Increase for very large PDFs. |
395391
| `max_pages` | int | `0` | Maximum pages to OCR per scanned document. 0 means no limit. |
396-
| `enabled` | bool | `true` | Set to `false` to disable LLM-powered extraction. When disabled, no structured data is extracted from documents. |
392+
| `enable` | bool | `true` | Set to `false` to disable LLM-powered structured extraction. OCR and pdftotext still run (see `[extraction.ocr]`). |
393+
| `enabled` | bool | -- | **Deprecated.** Use `enable` instead. |
397394
| `thinking` | bool | `false` | **Deprecated.** Use `[llm.extraction] thinking` instead. |
398395

396+
### `[extraction.ocr]` section
397+
398+
OCR sub-pipeline settings. Requires `tesseract` and `pdftocairo`.
399+
400+
| Key | Type | Default | Description |
401+
|-----|------|---------|-------------|
402+
| `enable` | bool | `true` | Set to `false` to disable OCR on documents. When disabled, scanned pages and images produce no text. |
403+
| `confidence_threshold` | int | `0` | Minimum tesseract word confidence (0-100) to keep. Words below this threshold are dropped. 0 means no filtering. |
404+
399405
### `[locale]` section
400406

401407
Locale and currency settings. Controls currency formatting across all money

internal/config/config.go

Lines changed: 71 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -264,14 +264,14 @@ type Extraction struct {
264264
// documents. 0 means no limit (all pages). Default: 0.
265265
MaxPages int `toml:"max_pages"`
266266

267-
// Enabled controls whether LLM-powered extraction runs when a document
268-
// is uploaded. When disabled, no structured data is extracted -- OCR and
269-
// pdftotext are internal pipeline steps, not standalone features. Default: true.
270-
Enabled *bool `toml:"enabled,omitempty"`
267+
// Enable controls whether LLM-powered structured extraction runs when
268+
// a document is uploaded. When disabled, no structured data is extracted
269+
// from documents. OCR and pdftotext still run independently (controlled
270+
// by [extraction.ocr]) to populate the document's stored text. Default: true.
271+
Enable *bool `toml:"enable,omitempty"`
271272

272-
// TextTimeout is the maximum time to wait for pdftotext. Go duration
273-
// string, e.g. "30s", "1m". Default: "30s".
274-
TextTimeout string `toml:"text_timeout"`
273+
// Enabled is the deprecated spelling; migrated to Enable on load.
274+
Enabled *bool `toml:"enabled,omitempty"`
275275

276276
// LLMTimeout is the maximum time to wait for the LLM extraction
277277
// inference step. Go duration string, e.g. "5m", "90s". Default: "5m".
@@ -281,28 +281,39 @@ type Extraction struct {
281281
// Supported values: none, low, medium, high, auto.
282282
// Empty string = don't send (server default). Default: empty.
283283
Thinking string `toml:"thinking,omitempty"`
284+
285+
// OCR holds settings for the OCR sub-pipeline.
286+
OCR OCR `toml:"ocr" doc:"OCR sub-pipeline. Requires tesseract and pdftocairo."`
287+
}
288+
289+
// OCR holds settings for the OCR sub-pipeline within extraction.
290+
type OCR struct {
291+
// Enable controls whether OCR runs on uploaded documents.
292+
// When disabled, scanned pages and images produce no text. Default: true.
293+
Enable *bool `toml:"enable,omitempty"`
294+
295+
// ConfidenceThreshold is the minimum tesseract word confidence (0-100)
296+
// to keep in OCR output. Words below this threshold are dropped.
297+
// 0 means no filtering (all words kept). Default: 0.
298+
ConfidenceThreshold int `toml:"confidence_threshold"`
284299
}
285300

286301
// IsEnabled returns whether LLM extraction is enabled. Defaults to true
287302
// when the field is unset.
288303
func (e Extraction) IsEnabled() bool {
289-
if e.Enabled != nil {
290-
return *e.Enabled
304+
if e.Enable != nil {
305+
return *e.Enable
291306
}
292307
return true
293308
}
294309

295-
// TextTimeoutDuration returns the parsed text extraction timeout, falling
296-
// back to DefaultTextTimeout if the value is empty or unparseable.
297-
func (e Extraction) TextTimeoutDuration() time.Duration {
298-
if e.TextTimeout == "" {
299-
return DefaultTextTimeout
300-
}
301-
d, err := time.ParseDuration(e.TextTimeout)
302-
if err != nil {
303-
return DefaultTextTimeout
310+
// IsOCREnabled returns whether OCR is enabled. Defaults to true when
311+
// the field is unset.
312+
func (e Extraction) IsOCREnabled() bool {
313+
if e.OCR.Enable != nil {
314+
return *e.OCR.Enable
304315
}
305-
return d
316+
return true
306317
}
307318

308319
// LLMTimeoutDuration returns the parsed LLM extraction timeout, falling
@@ -341,7 +352,6 @@ const (
341352
DefaultLLMExtractionTimeout = DefaultLLMTimeout
342353
DefaultCacheTTL = 30 * 24 * time.Hour // 30 days
343354
DefaultMaxPages = 0
344-
DefaultTextTimeout = 30 * time.Second
345355
configRelPath = "micasa/config.toml"
346356
)
347357

@@ -378,6 +388,15 @@ func LoadFromPath(path string) (Config, error) {
378388
return cfg, err
379389
}
380390

391+
// Clear deprecated Enabled again: applyEnvOverrides may have
392+
// repopulated it from MICASA_EXTRACTION_ENABLED.
393+
if cfg.Extraction.Enabled != nil {
394+
if cfg.Extraction.Enable == nil {
395+
cfg.Extraction.Enable = cfg.Extraction.Enabled
396+
}
397+
cfg.Extraction.Enabled = nil
398+
}
399+
381400
// Normalize base URLs: strip trailing slash and /v1 suffix --
382401
// providers handle their own path construction.
383402
cfg.LLM.BaseURL = normalizeBaseURL(cfg.LLM.BaseURL)
@@ -488,22 +507,6 @@ func LoadFromPath(path string) (Config, error) {
488507
)
489508
}
490509

491-
if cfg.Extraction.TextTimeout != "" {
492-
d, err := time.ParseDuration(cfg.Extraction.TextTimeout)
493-
if err != nil {
494-
return cfg, fmt.Errorf(
495-
"extraction.text_timeout: invalid duration %q -- use Go syntax like \"30s\" or \"1m\"",
496-
cfg.Extraction.TextTimeout,
497-
)
498-
}
499-
if d <= 0 {
500-
return cfg, fmt.Errorf(
501-
"extraction.text_timeout must be positive, got %s",
502-
cfg.Extraction.TextTimeout,
503-
)
504-
}
505-
}
506-
507510
if cfg.Extraction.LLMTimeout != "" {
508511
d, err := time.ParseDuration(cfg.Extraction.LLMTimeout)
509512
if err != nil {
@@ -527,6 +530,13 @@ func LoadFromPath(path string) (Config, error) {
527530
)
528531
}
529532

533+
if cfg.Extraction.OCR.ConfidenceThreshold < 0 || cfg.Extraction.OCR.ConfidenceThreshold > 100 {
534+
return cfg, fmt.Errorf(
535+
"extraction.ocr.confidence_threshold must be 0-100, got %d",
536+
cfg.Extraction.OCR.ConfidenceThreshold,
537+
)
538+
}
539+
530540
checkFilePermissions(&cfg, path)
531541

532542
return cfg, nil
@@ -905,6 +915,17 @@ func migrateRenamedKeys(cfg *Config, md toml.MetaData, path string) {
905915
)
906916
}
907917

918+
// extraction.enabled -> extraction.enable (v1.78)
919+
if md.IsDefined("extraction", "enabled") {
920+
if !md.IsDefined("extraction", "enable") {
921+
cfg.Extraction.Enable = cfg.Extraction.Enabled
922+
}
923+
cfg.Warnings = append(cfg.Warnings,
924+
"extraction.enabled is deprecated -- use extraction.enable instead",
925+
)
926+
}
927+
cfg.Extraction.Enabled = nil // never propagate the deprecated field
928+
908929
// extraction.model -> llm.extraction.model (v1.59)
909930
if md.IsDefined("extraction", "model") && !md.IsDefined("llm", "extraction", "model") {
910931
cfg.LLM.Extraction.Model = cfg.Extraction.Model
@@ -926,6 +947,9 @@ func migrateRenamedKeys(cfg *Config, md toml.MetaData, path string) {
926947
// replacements. Processed newest-first so that the most recent intermediate
927948
// name wins when multiple generations of the same variable are set.
928949
var envRenames = []struct{ old, canonical string }{
950+
// v1.78: extraction.enabled -> extraction.enable.
951+
{"MICASA_EXTRACTION_ENABLED", "MICASA_EXTRACTION_ENABLE"},
952+
929953
// v1.77: env var names now derived from dotted config paths.
930954
{"MICASA_CURRENCY", "MICASA_LOCALE_CURRENCY"},
931955
{"MICASA_MAX_DOCUMENT_SIZE", "MICASA_DOCUMENTS_MAX_FILE_SIZE"},
@@ -934,7 +958,6 @@ var envRenames = []struct{ old, canonical string }{
934958
{"MICASA_FILE_PICKER_DIR", "MICASA_DOCUMENTS_FILE_PICKER_DIR"},
935959
{"MICASA_EXTRACTION_MAX_EXTRACT_PAGES", "MICASA_EXTRACTION_MAX_PAGES"},
936960
{"MICASA_MAX_EXTRACT_PAGES", "MICASA_EXTRACTION_MAX_PAGES"},
937-
{"MICASA_TEXT_TIMEOUT", "MICASA_EXTRACTION_TEXT_TIMEOUT"},
938961

939962
// v1.59
940963
{"MICASA_EXTRACTION_MODEL", "MICASA_LLM_EXTRACTION_MODEL"},
@@ -1102,9 +1125,9 @@ model = "` + DefaultModel + `"
11021125
# file_picker_dir = "/home/user/Documents"
11031126
11041127
[extraction]
1105-
# Timeout for pdftotext. Go duration syntax: "30s", "1m", etc. Default: "30s".
1106-
# Increase if you routinely process very large PDFs.
1107-
# text_timeout = "30s"
1128+
# Set to false to disable LLM-powered structured extraction. OCR and pdftotext
1129+
# still run (see [extraction.ocr]) to populate document text for search/display.
1130+
# enable = true
11081131
11091132
# Timeout for LLM extraction inference. Go duration syntax: "5m", "90s", etc.
11101133
# Default: "5m". Increase for slow local models or complex documents.
@@ -1113,9 +1136,14 @@ model = "` + DefaultModel + `"
11131136
# Maximum pages for async extraction of scanned documents. 0 = no limit. Default: 0.
11141137
# max_pages = 0
11151138
1116-
# Set to false to disable LLM-powered extraction even when LLM is configured.
1117-
# When disabled, no structured data is extracted from documents.
1118-
# enabled = true
1139+
# [extraction.ocr]
1140+
# Set to false to disable OCR on uploaded documents. When disabled, scanned
1141+
# pages and images produce no text. Default: true.
1142+
# enable = true
1143+
1144+
# Minimum tesseract word confidence (0-100) to keep. Words below this
1145+
# threshold are dropped. 0 = no filtering. Default: 0.
1146+
# confidence_threshold = 0
11191147
11201148
[locale]
11211149
# ISO 4217 currency code. Stored in the database on first run; after that the

0 commit comments

Comments
 (0)