Skip to content

Latest commit

 

History

History
423 lines (297 loc) · 12.1 KB

File metadata and controls

423 lines (297 loc) · 12.1 KB

Configuration

This guide covers all global configuration options for ReqLLM, including timeouts, connection pools, and runtime settings.

Quick Reference

# config/config.exs
config :req_llm,
  # HTTP timeouts (all values in milliseconds)
  receive_timeout: 120_000,          # Default response timeout
  stream_receive_timeout: 120_000,   # Streaming chunk timeout
  stream_pool_timeout: 120_000,      # Streaming connection checkout timeout
  stream_pool_protocols: [:http1],   # Default stream pool protocols
  stream_pool_size: 1,               # HTTP/1 connections per stream pool worker
  stream_pool_count: 8,              # Stream pool workers per origin
  stream_pool_strategy: nil,         # Finch shard selection strategy
  metadata_timeout: 120_000,         # Streaming metadata collection timeout
  thinking_timeout: 300_000,         # Extended timeout for reasoning models
  image_receive_timeout: 120_000,    # Image generation timeout

  # Streaming request transforms
  finch_request_adapter: MyApp.FinchAdapter,  # Module implementing ReqLLM.FinchRequestAdapter

  # Key management
  load_dotenv: true,                 # Auto-load .env files at startup

  # Telemetry
  telemetry: [payloads: :none],      # Request payload policy (:none or :raw)

  # Privacy
  redact_context: false,             # Hide message contents in inspect output

  # Debugging
  debug: false                       # Enable verbose logging

Timeout Configuration

ReqLLM uses multiple timeout settings to handle different scenarios:

receive_timeout (default: 30,000ms)

The standard HTTP response timeout for non-streaming requests. Increase this for slow models or large responses.

config :req_llm, receive_timeout: 60_000

Per-request override:

ReqLLM.generate_text("openai:gpt-4o", "Hello", receive_timeout: 60_000)

stream_receive_timeout (default: inherits from receive_timeout)

Timeout between streaming chunks. If no data arrives within this window, the stream fails.

config :req_llm, stream_receive_timeout: 120_000

stream_pool_timeout (when unset: inherits from stream_receive_timeout)

Timeout for checking out a Finch connection before a streaming request starts. Increase this when short bursts of concurrent streams can queue behind long-running responses.

config :req_llm, stream_pool_timeout: 300_000

Per-request override:

ReqLLM.stream_text(model, messages, pool_timeout: 300_000)

stream_pool_protocols (default: [:http1])

Protocols for ReqLLM's default Finch stream pool. Use HTTP/1 for broad provider compatibility, or HTTP/2-only when all target providers support HTTP/2.

config :req_llm, stream_pool_protocols: [:http2]

Avoid mixed HTTP/1+HTTP/2 ALPN pools for large prompts. Due to a Finch flow-control issue, [:http2, :http1] and [:http1, :http2] may fail when request bodies exceed 64KB.

stream_pool_size (default: 1)

Maximum HTTP/1 connections per stream pool worker. With the default HTTP/1 transport, concurrent streams per origin are roughly stream_pool_size * stream_pool_count.

config :req_llm, stream_pool_size: 2

stream_pool_count (default: 8)

Number of stream pool workers per origin. Increase this when high concurrent streaming load produces Finch checkout queue timeouts and the downstream provider can handle more simultaneous streams.

config :req_llm, stream_pool_count: 32

stream_pool_strategy (default: nil)

Finch shard selection strategy used when stream_pool_count is greater than 1. Finch defaults to random shard selection. Large streaming deployments can use round-robin to spread stream starts evenly across pool workers:

# config/runtime.exs
round_robin = Finch.Pool.Strategy.RoundRobin.new()

config :req_llm,
  stream_pool_strategy: {Finch.Pool.Strategy.RoundRobin, round_robin}

Per-request override:

ReqLLM.stream_text(model, messages, pool_strategy: Finch.Pool.Strategy.Random)

These settings configure ReqLLM's default Finch pool. If you set config :req_llm, finch: [pools: ...], that explicit Finch pool configuration takes precedence.

thinking_timeout (default: 300,000ms / 5 minutes)

Extended timeout for reasoning models that "think" before responding (e.g., Claude with extended thinking, OpenAI o1/o3 models, Z.AI thinking mode). These models may take several minutes to produce the first token.

config :req_llm, thinking_timeout: 600_000  # 10 minutes

Automatic detection: ReqLLM automatically applies thinking_timeout when:

  • Extended thinking is enabled on Anthropic models
  • Using OpenAI o1/o3 reasoning models
  • Z.AI or Z.AI Coder thinking mode is enabled

metadata_timeout (default: 300,000ms)

Timeout for collecting streaming metadata (usage, finish_reason) after the stream completes. Long-running streams or slow providers may need more time.

config :req_llm, metadata_timeout: 120_000

Per-request override:

ReqLLM.stream_text("anthropic:claude-haiku-4-5", "Hello", metadata_timeout: 60_000)

image_receive_timeout (default: 120,000ms)

Extended timeout specifically for image generation operations, which can take longer than text generation.

config :req_llm, image_receive_timeout: 180_000

Connection Pool Configuration

ReqLLM uses Finch for HTTP connections. By default, HTTP/1-only pools are used because Finch's mixed HTTP/1+HTTP/2 ALPN pools have a known large-body flow-control issue.

Streaming responses hold a connection until the stream completes. With the default HTTP/1 configuration, each origin can run up to size * count concurrent checked-out connections before new streams wait in Finch's checkout queue.

Default Configuration

config :req_llm,
  stream_pool_protocols: [:http1],
  stream_pool_size: 1,
  stream_pool_count: 8

High-Concurrency Configuration

For applications making many concurrent requests:

# config/runtime.exs
round_robin = Finch.Pool.Strategy.RoundRobin.new()

config :req_llm,
  stream_pool_timeout: 300_000,
  stream_pool_protocols: [:http1],
  stream_pool_size: 1,
  stream_pool_count: 32,
  stream_pool_strategy: {Finch.Pool.Strategy.RoundRobin, round_robin}

When this is not enough or when you need origin-specific settings, replace the full Finch pool configuration:

config :req_llm,
  finch: [
    name: ReqLLM.Finch,
    pools: %{
      :default => [protocols: [:http1], size: 1, count: 32]
    }
  ]

If you see Finch was unable to provide a connection within the timeout due to excess queuing for connections, tune both sides of the limit:

  • Raise stream_pool_timeout when bursty workloads can safely wait for an existing stream to finish.
  • Increase stream_pool_count or stream_pool_size when the downstream provider and your rate limits can handle more simultaneous streams.
  • Add application-level concurrency limits when provider rate limits, costs, or latency make unbounded queueing unsafe.

For example, to allow roughly 32 concurrent HTTP/1 streams per provider origin:

# config/runtime.exs
round_robin = Finch.Pool.Strategy.RoundRobin.new()

config :req_llm,
  stream_pool_timeout: 300_000,
  stream_pool_protocols: [:http1],
  stream_pool_size: 1,
  stream_pool_count: 32,
  stream_pool_strategy: {Finch.Pool.Strategy.RoundRobin, round_robin}

HTTP/2 Configuration (Advanced)

Use HTTP/2-only when all target providers support HTTP/2:

config :req_llm,
  stream_pool_protocols: [:http2],
  stream_pool_count: 8

Use mixed HTTP/1+HTTP/2 ALPN pools with caution. They may fail with request bodies larger than 64KB:

config :req_llm,
  finch: [
    name: ReqLLM.Finch,
    pools: %{
      :default => [protocols: [:http2, :http1], size: 1, count: 8]
    }
  ]

Custom Finch Instance Per-Request

{:ok, response} = ReqLLM.stream_text(model, messages, finch_name: MyApp.CustomFinch)

Streaming Request Transforms

ReqLLM provides two hooks for modifying a Finch.Request struct just before a streaming request is sent (to align with a similar ability present in Req) — useful for injecting headers, adding tracing metadata, or other environment-specific concerns.

finch_request_adapter (config-level)

Set a module that implements the ReqLLM.FinchRequestAdapter behaviour. Because config files cannot hold anonymous functions, this mechanism requires a named module.

# config/test.exs
config :req_llm, finch_request_adapter: MyApp.TestFinchAdapter
defmodule MyApp.TestFinchAdapter do
  @behaviour ReqLLM.FinchRequestAdapter

  @impl true
  def call(%Finch.Request{} = request) do
    %{request | headers: request.headers ++ [{"x-test-env", "true"}]}
  end
end

on_finch_request (per-request)

Pass an anonymous function (Finch.Request.t() -> Finch.Request.t()) as a per-call option:

ReqLLM.stream_text("openai:gpt-4o", "Hello",
  on_finch_request: fn req ->
    %{req | headers: req.headers ++ [{"x-request-id", UUID.generate()}]}
  end
)

Precedence

Both mechanisms can be combined. The config-level adapter is applied first, then the per-request callback. Each step receives the output of the previous one.

Telemetry Configuration

ReqLLM emits native :telemetry events for every request. The only application-level setting is the payload capture mode:

config :req_llm, telemetry: [payloads: :none]   # default — metadata only
config :req_llm, telemetry: [payloads: :raw]    # include sanitized payloads

Raw payloads are sanitized (reasoning text redacted, binaries summarized, tools reduced to stable metadata) — :none is the safer default for multi-tenant systems.

Override per request via the telemetry: option:

ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello", telemetry: [payloads: :raw])

See the Telemetry Guide for the full event model, payload semantics, and the OpenTelemetry bridge.

API Key Configuration

Keys are loaded with clear precedence: per-request → in-memory → app config → env vars → .env files.

.env Files (Recommended)

# .env
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...

Disable automatic .env loading:

config :req_llm, load_dotenv: false

Application Config

config :req_llm,
  anthropic_api_key: "sk-ant-...",
  openai_api_key: "sk-..."

Runtime / In-Memory

ReqLLM.put_key(:anthropic_api_key, "sk-ant-...")
ReqLLM.put_key(:openai_api_key, "sk-...")

Per-Request Override

ReqLLM.generate_text("openai:gpt-4o", "Hello", api_key: "sk-...")

Provider-Specific Configuration

Configure base URLs or other provider-specific settings:

config :req_llm, :azure,
  base_url: "https://your-resource.openai.azure.com",
  api_version: "2024-08-01-preview"

See individual provider guides for available options.

Debug Mode

Enable verbose logging for troubleshooting:

config :req_llm, debug: true

Or via environment variable:

REQ_LLM_DEBUG=1 mix test

Context Redaction

Hide message contents when a Context struct is inspected, preventing sensitive prompts or responses from leaking into logs:

config :req_llm, redact_context: true

When enabled, inspect/2 shows only the message count:

inspect(context)
#=> "#Context<4 messages [REDACTED]>"

When disabled (the default), the full message preview is shown as normal:

inspect(context)
#=> "#Context<2 msgs: system:\"You are a helpful assistant\", user:\"Hello\">"

Example: Production Configuration

# config/prod.exs
config :req_llm,
  receive_timeout: 120_000,
  stream_receive_timeout: 120_000,
  stream_pool_timeout: 120_000,
  stream_pool_protocols: [:http1],
  stream_pool_size: 1,
  stream_pool_count: 16,
  stream_pool_strategy: nil,
  thinking_timeout: 300_000,
  metadata_timeout: 120_000,
  telemetry: [payloads: :none],
  load_dotenv: false  # Use proper secrets management in production

Example: Development Configuration

# config/dev.exs
config :req_llm,
  receive_timeout: 60_000,
  debug: true,
  load_dotenv: true