Skip to content

refactor(config): fix LLM timeout architecture #732

@cpcloud

Description

@cpcloud

Current timeout architecture conflates two different concerns and has inconsistent layering.

Current state

  • llm.timeout (default 5m) is passed as the HTTP client ResponseTimeout, applying to all requests including streaming chat — way too long for quick ops, and conflates two concerns
  • QuickOpTimeout (30s) is hardcoded in internal/llm/client.go for ping/model listing — not configurable, completely ignores llm.timeout
  • extraction.llm_timeout (default 5m) is a separate context deadline for extraction inference, but llm.timeout also applies as the HTTP-level timeout on the same request
  • Per-pipeline llm.chat.timeout and llm.extraction.timeout override the HTTP client timeout, adding yet another layer of confusion

What we need

  1. A single shared timeout for fast LLM operations (ping, model listing, auto-detect). Same across chat and extraction. Configurable, replacing the hardcoded QuickOpTimeout. Keep llm.timeout for this with a short default (e.g. 30s).

  2. Per-use-case timeouts for LLM processing — how long chat or extraction inference is allowed to take. Independently configurable:

    • llm.chat.timeout → chat response timeout (context deadline)
    • llm.extraction.timeout → extraction inference timeout (context deadline, replacing extraction.llm_timeout)

Migration

  • extraction.llm_timeout → deprecated, replaced by llm.extraction.timeout
  • llm.timeout → becomes quick-op timeout with short default (30s)
  • HTTP client timeout should be derived (e.g. max of quick-op and per-pipeline timeout), not independently configured
  • QuickOpTimeout constant → removed, replaced by configured value

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions