-
Notifications
You must be signed in to change notification settings - Fork 4.4k
05 Providers
Relevant source files
The following files were used as context for generating this wiki page:
This document describes the Provider system, which abstracts LLM APIs behind a unified interface. ZeroClaw supports 28+ built-in providers and custom endpoints via a trait-driven architecture. The system includes automatic retry logic, credential management, and multi-provider fallback chains.
For information about configuring providers in config.toml, see Configuration File Reference. For streaming responses and tool calling behavior, see Tool Calling Architecture.
The Provider system enables ZeroClaw to interact with any LLM service through a common interface. Key responsibilities include:
- Unified API abstraction: Single trait interface for all LLM interactions
- Credential resolution: Automatic API key discovery from config, environment variables, and OAuth flows
- Resilience: Retry logic, exponential backoff, and provider fallback chains
- Security: Credential scrubbing from error messages to prevent LLM leakage
- Extensibility: Support for custom OpenAI-compatible and Anthropic-compatible endpoints
Sources: src/providers/mod.rs:1-1000, src/providers/traits.rs:1-450, README.md:308-322
The provider system uses a three-layer architecture: Factory → Router → Resilience → Implementation.
graph TB
subgraph "Factory Layer"
create_provider["create_provider()"]
create_resilient_provider["create_resilient_provider()"]
create_routed_provider["create_routed_provider()"]
end
subgraph "Routing Layer"
RouterProvider["RouterProvider<br/>model-specific routing"]
end
subgraph "Resilience Layer"
ReliableProvider["ReliableProvider<br/>retries + fallbacks<br/>API key rotation"]
end
subgraph "Implementation Layer"
AnthropicProvider["AnthropicProvider"]
OpenAiProvider["OpenAiProvider"]
OpenRouterProvider["OpenRouterProvider"]
GeminiProvider["GeminiProvider"]
OllamaProvider["OllamaProvider"]
OpenAiCompatibleProvider["OpenAiCompatibleProvider<br/>20+ services"]
CopilotProvider["CopilotProvider"]
OpenAiCodexProvider["OpenAiCodexProvider"]
end
subgraph "External APIs"
AnthropicAPI["Anthropic API<br/>api.anthropic.com"]
OpenAIAPI["OpenAI API<br/>api.openai.com"]
OpenRouterAPI["OpenRouter API<br/>openrouter.ai"]
GeminiAPI["Gemini API<br/>generativelanguage.googleapis.com"]
OllamaAPI["Ollama API<br/>localhost:11434"]
CompatibleAPIs["Venice, Groq, Mistral,<br/>DeepSeek, Together, etc."]
end
create_provider --> ReliableProvider
create_resilient_provider --> ReliableProvider
create_routed_provider --> RouterProvider
RouterProvider --> ReliableProvider
ReliableProvider --> AnthropicProvider
ReliableProvider --> OpenAiProvider
ReliableProvider --> OpenRouterProvider
ReliableProvider --> GeminiProvider
ReliableProvider --> OllamaProvider
ReliableProvider --> OpenAiCompatibleProvider
ReliableProvider --> CopilotProvider
ReliableProvider --> OpenAiCodexProvider
AnthropicProvider --> AnthropicAPI
OpenAiProvider --> OpenAIAPI
OpenRouterProvider --> OpenRouterAPI
GeminiProvider --> GeminiAPI
OllamaProvider --> OllamaAPI
OpenAiCompatibleProvider --> CompatibleAPIs
Three-Layer Design Pattern
Sources: src/providers/mod.rs:572-840, src/providers/reliable.rs:1-250, src/providers/router.rs:1-200
All providers implement the Provider trait, which defines the core LLM interaction methods.
classDiagram
class Provider {
<<trait>>
+chat_with_system(system, message, model, temperature) Result~String~
+chat_with_history(messages, model, temperature) Result~String~
+chat(request, model, temperature) Result~ChatResponse~
+simple_chat(message, model, temperature) Result~String~
+warmup() Result
+capabilities() ProviderCapabilities
+convert_tools(tools) ToolsPayload
+supports_native_tools() bool
}
class ChatRequest {
+messages: &[ChatMessage]
+tools: Option~&[ToolSpec]~
}
class ChatResponse {
+text: Option~String~
+tool_calls: Vec~ToolCall~
+has_tool_calls() bool
+text_or_empty() &str
}
class ProviderCapabilities {
+native_tool_calling: bool
}
Provider ..> ChatRequest : accepts
Provider ..> ChatResponse : returns
Provider ..> ProviderCapabilities : declares
Core Provider Interface
| Method | Purpose | When to Use |
|---|---|---|
simple_chat() |
One-shot interaction without system prompt | Direct user queries, non-agentic use |
chat_with_system() |
One-shot with explicit system prompt | Single-turn with custom instructions |
chat_with_history() |
Multi-turn conversation | Stateful interactions with context |
chat() |
Structured agent loop API | Tool calling, complex orchestration |
warmup() |
Pre-establish connection pools | Prevent cold-start timeouts |
The trait provides multiple entry points, with default implementations that delegate to lower-level methods:
simple_chat() → chat_with_system() → [Provider Implementation]
chat_with_history() → chat_with_system() → [Provider Implementation]
chat() → chat_with_history() → [Provider Implementation]
Sources: src/providers/traits.rs:229-350, src/providers/traits.rs:52-71
Three factory functions create providers with progressively more capabilities:
flowchart TD
Start["Agent/Channel needs Provider"] --> FactoryChoice{Creation Method?}
FactoryChoice -->|Basic| create_provider["create_provider(name, api_key)"]
FactoryChoice -->|Resilient| create_resilient_provider["create_resilient_provider(name, api_key,<br/>api_url, reliability)"]
FactoryChoice -->|Routed| create_routed_provider["create_routed_provider(name, api_key,<br/>api_url, reliability, routes, default_model)"]
create_provider --> DirectProvider["Direct Provider Instance<br/>(no retries)"]
create_resilient_provider --> ReliableWrapper["ReliableProvider<br/>wraps provider(s)"]
create_routed_provider --> RouterCheck{Model routes<br/>configured?}
RouterCheck -->|Yes| RouterWrapper["RouterProvider<br/>wraps ReliableProvider<br/>per route"]
RouterCheck -->|No| ReliableWrapper
ReliableWrapper --> CheckFallbacks{Fallback<br/>providers?}
CheckFallbacks -->|Yes| MultiplePrimary["Primary + Fallback Chain<br/>(e.g. openrouter → anthropic)"]
CheckFallbacks -->|No| SinglePrimary["Single Provider<br/>(with retries)"]
DirectProvider --> ProviderImpl["Provider Implementation<br/>(OpenAI, Anthropic, etc.)"]
MultiplePrimary --> ProviderImpl
SinglePrimary --> ProviderImpl
RouterWrapper --> ProviderImpl
ProviderImpl --> ResolveAuth["resolve_provider_credential()<br/>checks config → env vars → OAuth"]
ResolveAuth --> ProviderReady["Provider Ready"]
Provider Creation Flow
create_provider(name, api_key)
- Creates a direct provider instance
- No retry or fallback logic
- Used for testing and simple scenarios
create_resilient_provider(name, api_key, api_url, reliability)
- Wraps provider in
ReliableProvider - Enables retries, exponential backoff, API key rotation
- Supports fallback provider chains (e.g.,
openrouter→anthropic) - Default behavior for production use
create_routed_provider(name, api_key, api_url, reliability, routes, default_model)
- Creates
RouterProviderifroutesis non-empty - Routes requests to different providers based on model hints
- Each route has its own resilience wrapper
- Falls back to
create_resilient_provider()if no routes configured
Sources: src/providers/mod.rs:572-590, src/providers/mod.rs:781-840, src/providers/mod.rs:842-910
The resolve_provider_credential() function implements a three-tier priority system for authentication:
flowchart TD
Start["resolve_provider_credential(name, api_key)"] --> CheckOverride{api_key param<br/>provided?}
CheckOverride -->|Yes| CheckMiniMax{Provider is<br/>MiniMax?}
CheckMiniMax -->|Yes + OAuth placeholder| MinimaxOAuth["Try MINIMAX_OAUTH_TOKEN<br/>→ MINIMAX_API_KEY<br/>→ refresh_minimax_oauth_access_token()"]
CheckMiniMax -->|No| UseOverride["Use trimmed api_key"]
MinimaxOAuth --> ReturnCred
UseOverride --> ReturnCred
CheckOverride -->|No| CheckProviderEnv["Check provider-specific env var<br/>(e.g. ANTHROPIC_API_KEY,<br/>OPENROUTER_API_KEY)"]
CheckProviderEnv --> FoundProviderEnv{Found?}
FoundProviderEnv -->|Yes| ReturnCred["Return credential"]
FoundProviderEnv -->|No| CheckGenericEnv["Check generic env vars<br/>ZEROCLAW_API_KEY, API_KEY"]
CheckGenericEnv --> FoundGenericEnv{Found?}
FoundGenericEnv -->|Yes| ReturnCred
FoundGenericEnv -->|No| CheckOAuth["Check OAuth flows<br/>(OpenAI Codex, Anthropic)"]
CheckOAuth --> FoundOAuth{Found?}
FoundOAuth -->|Yes| ReturnCred
FoundOAuth -->|No| ReturnNone["Return None"]
ReturnCred --> End["Credential"]
ReturnNone --> End
Credential Resolution Flow
-
Explicit parameter:
api_keyargument (trimmed, filtered if empty) -
Provider-specific env vars:
ANTHROPIC_API_KEY,OPENROUTER_API_KEY, etc. -
Generic env vars:
ZEROCLAW_API_KEY,API_KEY - OAuth flows: Provider-specific authentication (OpenAI Codex, Anthropic setup tokens)
| Provider | Environment Variables (in priority order) |
|---|---|
anthropic |
ANTHROPIC_OAUTH_TOKEN, ANTHROPIC_API_KEY
|
openrouter |
OPENROUTER_API_KEY |
openai |
OPENAI_API_KEY |
ollama |
OLLAMA_API_KEY |
venice |
VENICE_API_KEY |
groq |
GROQ_API_KEY |
mistral |
MISTRAL_API_KEY |
deepseek |
DEEPSEEK_API_KEY |
xai, grok
|
XAI_API_KEY |
together, together-ai
|
TOGETHER_API_KEY |
fireworks, fireworks-ai
|
FIREWORKS_API_KEY |
perplexity |
PERPLEXITY_API_KEY |
cohere |
COHERE_API_KEY |
moonshot, kimi
|
MOONSHOT_API_KEY |
glm, zhipu
|
GLM_API_KEY |
minimax |
MINIMAX_OAUTH_TOKEN, MINIMAX_API_KEY
|
qwen, dashscope
|
DASHSCOPE_API_KEY |
zai, z.ai
|
ZAI_API_KEY |
nvidia, nvidia-nim
|
NVIDIA_API_KEY |
MiniMax OAuth Refresh
When api_key = "minimax-oauth" is configured, the system attempts to refresh the access token automatically:
- Check
MINIMAX_OAUTH_TOKENenv var - Check
MINIMAX_API_KEYenv var - If
MINIMAX_OAUTH_REFRESH_TOKENis set, callrefresh_minimax_oauth_access_token()- Hits
https://api.minimax.io/oauth/token(global) orhttps://api.minimaxi.com/oauth/token(China) - Uses
grant_type=refresh_token - Returns new access token or error
- Hits
Anthropic Setup Tokens
Anthropic supports both API keys and OAuth setup tokens (starting with sk-ant-oat01-). The provider detects setup tokens and uses the Authorization: Bearer header instead of x-api-key.
Sources: src/providers/mod.rs:452-547, src/providers/mod.rs:165-278, src/providers/anthropic.rs:166-182
ReliableProvider wraps one or more providers with retry logic, exponential backoff, API key rotation, and model fallbacks.
graph TB
Request["Agent requests chat(messages, model, temperature)"] --> BuildChain["Build model chain<br/>[primary, fallback1, fallback2]"]
BuildChain --> LoopModels["For each model in chain"]
LoopModels --> LoopProviders["For each provider"]
LoopProviders --> LoopRetries["For each retry attempt"]
LoopRetries --> TryCall["provider.chat()"]
TryCall --> CheckSuccess{Success?}
CheckSuccess -->|Yes| LogRecovery["Log recovery if retry > 0<br/>or model != primary"]
LogRecovery --> ReturnSuccess["Return response"]
CheckSuccess -->|No| ClassifyError["Classify error"]
ClassifyError --> ErrorType{Error type?}
ErrorType -->|Non-retryable| LogNonRetry["Log non-retryable<br/>(auth failure, invalid model)"]
ErrorType -->|Rate limit<br/>non-retryable| LogBusinessLimit["Log business limit<br/>(plan issue, quota)"]
ErrorType -->|Rate limit<br/>retryable| RotateKey["Attempt API key rotation"]
ErrorType -->|Retryable| ComputeBackoff["Compute backoff"]
LogNonRetry --> BreakRetries["Break retry loop<br/>(try next provider)"]
LogBusinessLimit --> BreakRetries
RotateKey --> LogRotation["Log key rotation"]
LogRotation --> CheckRetriesLeft{Retries<br/>remaining?}
ComputeBackoff --> CheckRetriesLeft
CheckRetriesLeft -->|Yes| Wait["Wait with exponential backoff<br/>Min: base_backoff_ms<br/>Max: 10,000ms<br/>Respects Retry-After header"]
CheckRetriesLeft -->|No| LogExhausted["Log retries exhausted"]
Wait --> LoopRetries
LogExhausted --> LoopProviders
BreakRetries --> LoopProviders
LoopProviders -->|Exhausted| LogModelFailed["Log model failed"]
LogModelFailed --> LoopModels
LoopModels -->|Exhausted| ReturnError["Return error with<br/>full failure chain"]
Resilience Flow
Non-Retryable Errors (immediate failure):
- 4xx HTTP status codes (except 429, 408)
- Authentication failures:
"invalid api key","unauthorized","forbidden", etc. - Model catalog mismatches:
"model not found","model unknown","unsupported model" - Business limit 429s:
"plan does not include","insufficient balance","quota exhausted"
Retryable Errors (exponential backoff):
- 5xx HTTP status codes
- Network timeouts
- 429 rate limits (transient capacity)
- 408 request timeout
Backoff Calculation:
attempt 0: base_backoff_ms (default 50ms)
attempt 1: base_backoff_ms * 2
attempt 2: base_backoff_ms * 4
...
max: 10,000ms
If Retry-After header present:
use max(Retry-After, base_backoff_ms), capped at 30s
When a rate limit error (429) occurs, ReliableProvider attempts to rotate to the next API key in the configured api_keys pool:
pub fn with_api_keys(mut self, keys: Vec<String>) -> SelfRotation uses round-robin indexing with atomic fetch-add. This enables high-throughput scenarios where a single provider can exhaust multiple API keys sequentially.
Configure per-model fallback chains in config.toml:
[[reliability.model_fallbacks]]
model = "anthropic/claude-sonnet-4-6"
fallbacks = ["anthropic/claude-3.5-sonnet", "anthropic/claude-3-opus"]When a model fails (exhausted retries across all providers), the system tries the next model in the fallback chain.
Sources: src/providers/reliable.rs:183-249, src/providers/reliable.rs:8-114, src/providers/reliable.rs:252-371
| Provider | Base URL | Auth Style | Native Tools | Special Features |
|---|---|---|---|---|
AnthropicProvider |
api.anthropic.com |
x-api-key or Bearer (setup tokens) |
✅ | Prompt caching, tool result blocks |
OpenAiProvider |
api.openai.com/v1 |
Bearer | ✅ | Standard OpenAI format |
OpenRouterProvider |
openrouter.ai/api/v1 |
Bearer | ✅ | Model aggregator, warmup endpoint |
GeminiProvider |
generativelanguage.googleapis.com/v1beta |
API key or OAuth | ✅ | CLI token reuse, ADC support |
OllamaProvider |
localhost:11434 |
Bearer (optional) | ❌ (prompt-guided) | Local/remote, :cloud suffix |
OpenAiCompatibleProvider provides a single implementation for 20+ services that follow the OpenAI /v1/chat/completions format:
Supported Services:
- Venice, Vercel AI, Cloudflare AI Gateway
- Moonshot (Kimi), Kimi Code
- Synthetic, OpenCode Zen
- Z.AI (GLM/Zhipu), MiniMax
- Qwen/DashScope, Baidu Qianfan
- Groq, Mistral, xAI (Grok), DeepSeek
- Together AI, Fireworks AI, Perplexity, Cohere
- LM Studio, NVIDIA NIM, Astrai, OVH Cloud
Configuration:
OpenAiCompatibleProvider::new(
"Provider Name",
"https://api.example.com/v1", // base URL
Some("api-key"), // credential
AuthStyle::Bearer // or AuthStyle::XApiKey
)The provider automatically detects if the base URL already includes /chat/completions to support non-standard endpoints (e.g., VolcEngine ARK uses /api/coding/v3/chat/completions).
OpenAI-compatible:
default_provider = "custom:https://your-api.com"Anthropic-compatible:
default_provider = "anthropic-custom:https://your-api.com"Sources: src/providers/anthropic.rs:10-164, src/providers/openai.rs:10-137, src/providers/openrouter.rs:10-112, src/providers/gemini.rs:10-244, src/providers/ollama.rs:6-85, src/providers/compatible.rs:17-202, src/providers/mod.rs:743-777
AnthropicProvider automatically applies prompt caching to reduce costs on repeated interactions:
- System prompt caching: Applied when system prompt exceeds 3KB (~1024 tokens)
- Conversation caching: Applied to the last message when history exceeds 4 messages
- Tool definition caching: Applied to the last tool in the schema (caches entire tool list)
Caching uses Anthropic's cache_control: { "type": "ephemeral" } blocks.
fn should_cache_system(text: &str) -> bool {
text.len() > 3072
}
fn should_cache_conversation(messages: &[ChatMessage]) -> bool {
messages.iter().filter(|m| m.role != "system").count() > 4
}GeminiProvider supports three authentication methods with automatic fallback:
- Explicit API key: Passed via config or parameter
-
Environment variables:
GEMINI_API_KEYorGOOGLE_API_KEY -
Gemini CLI OAuth: Reuses tokens from
~/.gemini/oauth_creds.json
OAuth tokens use the internal cloudcode-pa.googleapis.com/v1internal endpoint instead of the public API, as they are scoped for Code Assist features.
OllamaProvider supports both local and remote Ollama deployments:
-
Local:
api_urlunset →http://localhost:11434 -
Remote:
api_url = "https://ollama.com"+api_key = "..." -
Cloud suffix: Model IDs like
qwen3:cloudnormalize toqwen3and enforce remote + API key requirements
The provider detects localhost endpoints and disables authentication automatically, even when api_key is configured.
Sources: src/providers/anthropic.rs:184-230, src/providers/gemini.rs:145-164, src/providers/ollama.rs:66-113
All provider errors pass through scrub_secret_patterns() to prevent credential leakage to LLMs:
Scrubbed Prefixes:
-
sk-(OpenAI, Anthropic) -
xoxb-,xoxp-(Slack) -
ghp_,gho_,ghu_,github_pat_(GitHub)
Example:
// Input: "Invalid API key: sk-1234abcd5678efgh"
// Output: "Invalid API key: [REDACTED]"The scrubber uses a sliding window to detect token boundaries (alphanumeric, -, _, ., :) and replaces the entire token with [REDACTED].
sanitize_api_error() combines scrubbing with length truncation to prevent large error bodies from consuming context:
pub fn sanitize_api_error(input: &str) -> String {
let scrubbed = scrub_secret_patterns(input);
if scrubbed.chars().count() <= 200 {
return scrubbed;
}
format!("{}...", &scrubbed[..200])
}All provider implementations use api_error() to format HTTP errors:
pub async fn api_error(provider: &str, response: reqwest::Response) -> anyhow::Error {
let status = response.status();
let body = response.text().await.unwrap_or_else(|_| "<failed to read provider error body>".to_string());
let sanitized = sanitize_api_error(&body);
anyhow::anyhow!("{provider} API error ({status}): {sanitized}")
}Provider-specific and generic API key environment variables are read lazily and never logged. The credential resolution function returns Option<String> and providers fail fast with clear error messages when credentials are missing:
// Example from OpenAiProvider
let credential = self.credential.as_ref().ok_or_else(|| {
anyhow::anyhow!("OpenAI API key not set. Set OPENAI_API_KEY or edit config.toml.")
})?;Sources: src/providers/mod.rs:367-439, src/providers/mod.rs:441-450
ProviderRuntimeOptions enables advanced provider creation with OAuth profile overrides and custom state directories:
pub struct ProviderRuntimeOptions {
pub auth_profile_override: Option<String>,
pub zeroclaw_dir: Option<PathBuf>,
pub secrets_encrypt: bool,
}Usage:
let options = ProviderRuntimeOptions {
auth_profile_override: Some("openai-codex:work".to_string()),
zeroclaw_dir: Some(PathBuf::from("/custom/zeroclaw")),
secrets_encrypt: true,
};
let provider = create_provider_with_options("openai-codex", None, &options)?;This pattern enables:
- Multi-profile support: Switch between work/personal accounts for OpenAI Codex
- Custom state directories: Use non-default locations for auth profiles and secret stores
- Encryption control: Toggle secret encryption per-provider
Currently used by OpenAiCodexProvider for OAuth token management.
Sources: src/providers/mod.rs:350-365, src/providers/mod.rs:577-589
Several providers have region-specific base URLs with alias support:
| Canonical Name | Aliases | Intl Base URL | China Base URL |
|---|---|---|---|
minimax |
minimax-intl, minimax-io, minimax-global
|
api.minimax.io |
api.minimaxi.com |
glm |
zhipu, glm-global, zhipu-global
|
api.z.ai/api/paas/v4 |
open.bigmodel.cn/api/paas/v4 |
moonshot |
kimi, moonshot-cn, kimi-cn
|
api.moonshot.ai |
api.moonshot.cn |
qwen |
dashscope, qwen-cn, dashscope-cn
|
dashscope-intl.aliyuncs.com |
dashscope.aliyuncs.com |
zai |
z.ai, zai-global, z.ai-global
|
api.z.ai/api/coding/paas/v4 |
open.bigmodel.cn/api/coding/paas/v4 |
Example:
default_provider = "minimax-cn" # Uses api.minimaxi.com
# or
default_provider = "minimax-intl" # Uses api.minimax.iocanonical_china_provider_name() normalizes all aliases to a single canonical name for metrics and logging:
pub(crate) fn canonical_china_provider_name(name: &str) -> Option<&'static str> {
if is_qwen_alias(name) {
Some("qwen")
} else if is_glm_alias(name) {
Some("glm")
} else if is_moonshot_alias(name) {
Some("moonshot")
} else if is_minimax_alias(name) {
Some("minimax")
} else if is_zai_alias(name) {
Some("zai")
} else if is_qianfan_alias(name) {
Some("qianfan")
} else {
None
}
}Sources: src/providers/mod.rs:47-348
The Provider trait includes streaming methods for progressive response delivery:
async fn chat_stream(
&self,
request: ChatRequest<'_>,
model: &str,
temperature: f64,
options: StreamOptions,
) -> StreamResult<stream::BoxStream<'static, StreamResult<StreamChunk>>>;pub struct StreamChunk {
pub delta: String, // Text delta for this chunk
pub is_final: bool, // Whether this is the final chunk
pub token_count: usize, // Approximate token count (if enabled)
}OpenAiCompatibleProvider implements SSE (Server-Sent Events) parsing for OpenAI-compatible streaming:
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":" world"}}]}
data: [DONE]
The parser:
- Splits byte stream by newlines
- Extracts
data:prefix - Checks for
[DONE]sentinel - Parses JSON delta
- Extracts
contentorreasoning_contentfields
Fallback for Thinking Models:
Some models (Qwen3, GLM-4) stream reasoning in reasoning_content instead of content. The parser checks both fields.
Sources: src/providers/traits.rs:148-171, src/providers/compatible.rs:365-485