Skip to content

Latest commit

 

History

History
1014 lines (766 loc) · 30.7 KB

File metadata and controls

1014 lines (766 loc) · 30.7 KB

Reference

Complete technical reference for the ResilientLLM library API.

Table of Contents


ResilientLLM

A unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google/Gemini, Ollama) with built-in resilience features including rate limiting, retries, circuit breakers, and error handling.

ResilientLLM Constructor

Creates a new ResilientLLM instance.

Signature:

new ResilientLLM(options?: ResilientLLMOptions)

Parameters:

Parameter Type Required Default Description
options ResilientLLMOptions No {} Configuration options for the ResilientLLM instance

ResilientLLMOptions:

Property Type Required Default Description
aiService string No process.env.PREFERRED_AI_SERVICE or "anthropic" AI service provider: "openai", "anthropic", "google", or "ollama"
model string No process.env.PREFERRED_AI_MODEL or "claude-3-5-sonnet-20240620" Model identifier for the selected AI service
temperature number No process.env.AI_TEMPERATURE or 0 Temperature parameter (0-2) controlling randomness in responses
maxTokens number No process.env.MAX_TOKENS or 2048 Maximum number of tokens in the response
timeout number No process.env.LLM_TIMEOUT or 60000 Request timeout in milliseconds
cacheStore Object No {} Cache store object for storing successful responses
maxInputTokens number No process.env.MAX_INPUT_TOKENS or 100000 Maximum number of input tokens allowed
topP number No process.env.AI_TOP_P or 0.95 Top-p sampling parameter (0-1)
rateLimitConfig RateLimitConfig No { requestsPerMinute: 10, llmTokensPerMinute: 150000 } Rate limiting configuration
retries number No 3 Number of retry attempts for failed requests
backoffFactor number No 2 Exponential backoff multiplier between retries
onRateLimitUpdate Function No undefined Callback function called when rate limit information is updated
onError Function No undefined Currently not used (reserved for future use)

RateLimitConfig:

Property Type Description
requestsPerMinute number Maximum number of requests allowed per minute
llmTokensPerMinute number Maximum number of LLM tokens allowed per minute

Returns: ResilientLLM instance

Example:

const llm = new ResilientLLM({
  aiService: 'openai',
  model: 'gpt-5-nano',
  maxTokens: 2048,
  temperature: 0.7,
  rateLimitConfig: {
    requestsPerMinute: 60,
    llmTokensPerMinute: 90000
  }
});

ResilientLLM Instance Methods

chat(conversationHistory, llmOptions?)

Sends a chat completion request to the configured LLM provider.

Signature:

chat(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>

Parameters:

Parameter Type Required Description
conversationHistory Message[] Yes Array of message objects representing the conversation history
llmOptions ChatOptions No Override options for this specific request

Message:

Property Type Required Description
role string Yes Message role: "system", "user", "assistant", or "tool"
content string Yes Message content

ChatOptions:

Property Type Description
aiService string Override AI service for this request
model string Override model for this request
maxTokens number Override max tokens for this request
temperature number Override temperature for this request
topP number Override top-p for this request
maxInputTokens number Override max input tokens for this request
maxCompletionTokens number Maximum completion tokens (for reasoning models)
reasoningEffort string Reasoning effort level: "low", "medium", or "high" (for reasoning models)
apiKey string Override API key for this request (takes precedence over ProviderRegistry)
tools Tool[] Array of tool definitions for function calling
responseFormat Object | string Response format specification (json_object/json_schema object shapes, plain schema-like object, or JSON aliases: "json", "object", "json_object")
outputConfig Object Legacy/migration support. Anthropic-style alternative structured-output input shape, normalized internally via responseFormat. Prefer responseFormat for all new usage.
response_format Object | string Legacy/migration support. Snake_case alias for responseFormat; passthrough-friendly for provider-native payloads. Prefer responseFormat for all new usage.
output_config Object Legacy/migration support. Snake_case alias for outputConfig; passed through as-is when provided. Prefer responseFormat for all new usage.

Use one naming style per field to avoid ambiguity:

  • Prefer camelCase (responseFormat or its alias outputConfig) in app code.
  • Prefer snake_case (response_format, output_config) when reusing raw provider payload snippets.
  • Do not send both aliases for the same field in one request; conflicting info may result in error.

Tool:

Property Type Description
type string Tool type, typically "function"
function Object Function definition
function.name string Function name
function.description string Function description
function.parameters Object Function parameters schema (OpenAI format)
function.input_schema Object Function input schema (Anthropic format)

Returns: Promise<ChatResponse>

  • Always returns a predictable envelope:
    • response.content is the assistant output (string in text mode, parsed object in JSON/schema mode)
    • response.toolCalls is included when tool calls are returned
  • response.metadata is always included

ChatResponse:

Property Type Description
content string | Object | null The assistant content (text by default, normalized JSON object in JSON modes)
toolCalls Array Array of tool call objects (if tools were used)
metadata OperationMetadata Always included (request id, config, timing, retries, rate limiting, usage, etc.)

Throws:

  • ResilientLLMError — Normalized failures from chat() (after internal retries when applicable). Use error.code (ResilientLLMErrorCode), error.retryable, error.metadata, and error.cause (log server-side). The canonical code list is in lib/ResilientLLMError.ts.
  • Structured output failures use codes such as JSON_PARSE_ERROR, JSON_MODE_FAILURE, SCHEMA_MISMATCH, or VALIDATION_ERROR; details may appear on error.cause.

Notes:

  • API keys can be provided via llmOptions.apiKey, ProviderRegistry.configure(), or environment variables
  • The implementation uses ProviderRegistry to manage providers and their configurations
  • Response parsing is handled generically using provider-specific chatConfig settings
  • For schema mode, validation checks top-level required fields and primitive types (string, number, boolean, integer). Schema mismatch errors include a validation object with missingFields, extraFields, and typeMismatches arrays

Example:

const conversationHistory = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: 'What is the capital of France?' }
];

const { content } = await llm.chat(conversationHistory);
console.log(content); // "The capital of France is Paris."

Example with tools:

const response = await llm.chat(conversationHistory, {
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get the weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        }
      }
    }
  }]
});
// response: { content: null, toolCalls: [...] }

Example with API key override:

// Override API key for this specific request
const response = await llm.chat(conversationHistory, {
  apiKey: 'sk-custom-key-here',
  aiService: 'openai',
  model: 'gpt-5-nano'
});

Example with operation metadata:

const llm = new ResilientLLM({
  aiService: 'openai',
  model: 'gpt-5-nano',
});

const { content, metadata } = await llm.chat(conversationHistory);
console.log(content);           // Assistant reply text
console.log(metadata?.requestId);
console.log(metadata?.timing?.totalTimeMs);
console.log(metadata?.usage);    // prompt_tokens, completion_tokens, total_tokens

abort()

Cancels all ongoing LLM operations for this instance.

Signature:

abort(): void

Returns: void

Description:

  • Aborts all active HTTP requests initiated by this ResilientLLM instance
  • Clears all resilient operation instances
  • Resets the internal abort controller

Example:

const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the ongoing request

Note: For API URLs and key checks, import ProviderRegistry: use ProviderRegistry.getChatApiUrl(providerName) and ProviderRegistry.buildApiUrl(providerName, baseUrl, null) for URLs; use ProviderRegistry.hasApiKey(providerName) to check if a key is present (keys are not exposed). See Custom Provider Guide for details.


formatMessageForAnthropic(messages)

Converts a messages array to the format required by Anthropic's API.

Signature:

formatMessageForAnthropic(messages: Message[]): { system?: string, messages: Message[] }

Parameters:

Parameter Type Required Description
messages Message[] Yes Array of message objects

Returns: Object with properties:

  • system - string | undefined - System message content if present
  • messages - Message[] - Messages array without system messages

Description:

  • Extracts system messages from the messages array
  • Returns system content separately and remaining messages without system role

Example:

const messages = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hello!' }
];

const { system, messages } = llm.formatMessageForAnthropic(messages);
// system: "You are helpful."
// messages: [{ role: 'user', content: 'Hello!' }]

parseError(statusCode, error, operationMetadata?)

Normalizes an error into ResilientLLMError. Used internally when chat() fails; you can call it directly if you need the same mapping (e.g. tests).

Signature:

parseError(statusCode: number | null, error: Error, operationMetadata?: OperationMetadata | null): never

Parameters:

Parameter Type Required Description
statusCode number | null Yes Provider HTTP status when known, or null
error Error Yes Underlying error
operationMetadata OperationMetadata | null No Merged onto the thrown error’s metadata

Returns: never — always throws ResilientLLMError.

If error is already a ResilientLLMError, it is rethrown (metadata may be merged). Otherwise statusCode selects a PROVIDER_* code (e.g. 401PROVIDER_UNAUTHORIZED); null or unknown statuses map to PROVIDER_ERROR. See lib/ResilientLLMError.ts for the full ResilientLLMErrorCode union.


parseChatCompletion(data, chatConfig, tools?)

Generic method to parse chat completion response using provider configuration. This is the preferred method used internally.

Signature:

parseChatCompletion(data: Object, chatConfig: Object, tools?: Tool[]): string | ChatResponse

Parameters:

Parameter Type Required Description
data Object Yes API response object
chatConfig Object Yes Chat configuration from provider (contains responseParsePath)
tools Tool[] No Tools array if function calling was used

Returns: string | ChatResponse

  • If tools provided and tool calls found: Returns ChatResponse with content and toolCalls
  • Otherwise: Returns string content

chatConfig.responseParsePath:

  • Path to extract content from response (e.g., 'choices[0].message.content', 'content[0].text', 'response')
  • Supports dot notation and bracket notation for nested values

Example:

const chatConfig = {
  responseParsePath: 'choices[0].message.content',
  toolSchemaType: 'openai'
};
const data = {
  choices: [{
    message: {
      content: "Hello!",
      tool_calls: []
    }
  }]
};
const content = llm.parseChatCompletion(data, chatConfig);
// "Hello!"

parseOpenAIChatCompletion(data, tools?) (Deprecated)

Parses OpenAI chat completion response.

Signature:

parseOpenAIChatCompletion(data: Object, tools?: Tool[]): string | ChatResponse

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.


parseAnthropicChatCompletion(data, tools?) (Deprecated)

Parses Anthropic chat completion response.

Signature:

parseAnthropicChatCompletion(data: Object, tools?: Tool[]): string

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.


parseOllamaChatCompletion(data, tools?) (Deprecated)

Parses Ollama chat completion response.

Signature:

parseOllamaChatCompletion(data: Object, tools?: Tool[]): string

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.


parseGoogleChatCompletion(data, tools?) (Deprecated)

Parses Google chat completion response (OpenAI-compatible endpoint).

Signature:

parseGoogleChatCompletion(data: Object, tools?: Tool[]): string

Status: ⚠️ Deprecated - Use parseChatCompletion() with chatConfig instead.


retryChatWithAlternateService(conversationHistory, llmOptions?)

Retries the chat request with an alternate AI service when the current service returns rate limit errors (429, 529).

Signature:

retryChatWithAlternateService(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>

Parameters:

Parameter Type Required Description
conversationHistory Message[] Yes Array of message objects
llmOptions ChatOptions No LLM options for the request

Returns: Promise<ChatResponse> - Response from the alternate service

Throws:

  • Error - If no alternative service is available

Description:

  • Automatically switches to the next available service from ProviderRegistry.getDefaultModels()
  • Skips services that have already failed
  • Uses default model for each service

Example:

// Automatically called internally when rate limit errors occur
// Can also be called manually if needed
const response = await llm.retryChatWithAlternateService(conversationHistory);

ResilientLLM Static Methods

estimateTokens(text)

Estimates the number of tokens in a given text string.

Signature:

static estimateTokens(text: string): number

Parameters:

Parameter Type Required Description
text string Yes Text to estimate tokens for

Returns: number - Estimated token count

Description:

  • For texts longer than 10,000 characters: Uses approximation (~4 characters per token)
  • For shorter texts: Uses accurate tokenization with Tiktoken encoder (o200k_base encoding)
  • Uses lazy initialization of the encoder

Example:

const tokenCount = ResilientLLM.estimateTokens("Hello, world!");
// Returns estimated token count

Types and Interfaces

Message

Represents a single message in a conversation.

interface Message {
  role: 'system' | 'user' | 'assistant' | 'tool';
  content: string;
}

ChatResponse

Response envelope returned by chat() on every call.

  • content is the assistant output:
    • text mode -> string
    • JSON/schema mode -> parsed JS object
  • toolCalls is present when tool calls were returned
  • metadata is always included
interface ChatResponse {
  content: string | Object | null;
  toolCalls?: Array<any>;
  metadata: OperationMetadata;
}

OperationMetadata

Operation metadata attached to ChatResponse.metadata on every call. Used for observability, logging, and debugging.

interface OperationMetadata {
  requestId: string;
  operationId: string;
  startTime: number;
  finishReason?: string | null;
  config: {
    aiService: string;
    model: string;
    temperature: number | null;
    maxTokens: number | null;
    topP: number | null;
    maxInputTokens: number;
    estimatedInputTokens: number;
    enableCache: boolean;
    // ... resilience config (retries, rateLimitConfig, etc.)
  };
  events: Array<any>;
  timing: {
    totalTimeMs: number | null;
    rateLimitWaitMs: number;
    httpRequestMs: number | null;
  };
  retries: Array<any>;
  rateLimiting: { requestedTokens: number; totalWaitMs: number; [key: string]: any };
  circuitBreaker: Object;
  http: {
    url: string;
    method: string;
    statusCode: number | null;
    headers: Record<string, string>;
    durationMs?: number;
    error?: string;
  };
  cache: { enabled: boolean; [key: string]: any };
  service: { attempted: string[]; final: string };
  usage?: {
    prompt_tokens: number | null;
    completion_tokens: number | null;
    total_tokens: number | null;
  };
}

RateLimitConfig

Configuration for rate limiting.

interface RateLimitConfig {
  requestsPerMinute: number;
  llmTokensPerMinute: number;
}

ResilientLLMOptions

Constructor options for ResilientLLM.

interface ResilientLLMOptions {
  aiService?: string;
  model?: string;
  temperature?: number;
  maxTokens?: number;
  timeout?: number;
  cacheStore?: Object;
  maxInputTokens?: number;
  topP?: number;
  rateLimitConfig?: RateLimitConfig;
  retries?: number;
  backoffFactor?: number;
  onRateLimitUpdate?: (info: RateLimitInfo) => void;
  onError?: (error: Error) => void;
}

ChatOptions

Options for individual chat requests.

interface ChatOptions {
  aiService?: string;
  model?: string;
  maxTokens?: number;
  temperature?: number;
  topP?: number;
  maxInputTokens?: number;
  maxCompletionTokens?: number;
  reasoningEffort?: 'low' | 'medium' | 'high';
  apiKey?: string;
  tools?: Tool[];
  responseFormat?: Object;
  outputConfig?: Object;
}

responseFormat (JSON mode + schema mode)

Use responseFormat when you need the assistant response as JSON, optionally matching a particular schema.

  • JSON mode (no schema): ensures the reply is a single JSON object (library parses it for you).
  • Schema mode: provides a JSON Schema so the library can validate the parsed object and throw SCHEMA_MISMATCH when required keys/types don’t match.

Supplying a schema

You can supply a schema in any of these equivalent shapes (pick one and stick to it):

  • OpenAI-style wrapper (recommended when you want to be explicit):
responseFormat: {
  type: 'json_schema',
  json_schema: {
    name: 'my_payload',
    schema: {
      type: 'object',
      properties: {
        answer: { type: 'string' },
        citations: { type: 'array', items: { type: 'string' } }
      },
      required: ['answer']
    }
  }
}
  • Short wrapper (schema directly on the object):
responseFormat: {
  type: 'json_schema',
  schema: {
    type: 'object',
    properties: { answer: { type: 'string' } },
    required: ['answer']
  }
}
  • Plain schema-like object (auto-detected as a schema):
responseFormat: {
  type: 'object',
  properties: { answer: { type: 'string' } },
  required: ['answer']
}

End-to-end example (schema mode)

const llm = new ResilientLLM({ aiService: 'openai', model: 'gpt-5-nano' });

const result = await llm.chat(
  [{ role: 'user', content: 'Return an answer and citations.' }],
  {
    responseFormat: {
      type: 'json_schema',
      json_schema: {
        name: 'answer_payload',
        schema: {
          type: 'object',
          properties: {
            answer: { type: 'string' },
            citations: { type: 'array', items: { type: 'string' } }
          },
          required: ['answer']
        }
      }
    }
  }
);

// `result.content` is a parsed JS object when `responseFormat` requests JSON/schema mode.

Validation scope (important)

The built-in validator is intentionally lightweight: it checks required keys, extra keys, and primitive types at the top level (string, number, boolean, integer).

  • Extra keys are enforced only when your schema sets additionalProperties: false (and the schema has properties).
  • For deeper validation needs (nested objects, enums, regex, oneOf/anyOf, etc.), run your own schema validator after the call.

Example: additionalProperties: false + required

const result = await llm.chat(messages, {
  responseFormat: {
    type: 'json_schema',
    json_schema: {
      name: 'answer_payload',
      schema: {
        type: 'object',
        additionalProperties: false,
        properties: {
          answer: { type: 'string' }
        },
        required: ['answer']
      }
    }
  }
});

// `result.content` is { answer: string } when the model output matches the schema.
// If the model returns invalid JSON or extra keys, `llm.chat(...)` throws StructuredOutputError (e.g. `SCHEMA_MISMATCH`).

responseFormat examples (quick)

// JSON alias strings (equivalent to { type: 'json_object' })
'json'
'object'
'json_object'

// OpenAI-compatible JSON mode
{ type: 'json_object' }

// When `responseFormat` requests JSON, `llm.chat(...)` resolves to a response envelope
// where `.content` is the parsed JS object.

Tool

Tool definition for function calling.

interface Tool {
  type: string;
  function: {
    name: string;
    description: string;
    parameters?: Object;  // OpenAI format
    input_schema?: Object; // Anthropic format
  };
}

Error Codes

Failures from chat() are thrown as ResilientLLMError (see chat() Throws above). That type is the consumer-facing surface: code, retryable, optional metadata (same shape as success), and cause for logging.

Stable string codesResilientLLMErrorCode in lib/ResilientLLMError.ts (including PROVIDER_*, structured-output codes, resilience-related codes, and configuration/capability codes). retryable is defined there for codes where a simple retry might help.

Use error.code for branching, not raw HTTP status. When a provider HTTP status was available to the library, it may also appear under metadata (e.g. provider.httpStatus / http).


Environment Variables

API Key Configuration

API keys are required for most LLM providers. They can be provided in three ways (in order of precedence):

  1. Per-request via llmOptions.apiKey (highest priority)
  2. Via ProviderRegistry.configure() with direct apiKey parameter
  3. Via environment variables (lowest priority)

For advanced use cases (custom providers, multiple API keys, or programmatic configuration), see the Custom Provider Guide - Authentication Configuration.

Required (Service-Specific)

Set at least one API key for your chosen service:

Variable Service Required
OPENAI_API_KEY OpenAI Yes (if using OpenAI)
ANTHROPIC_API_KEY Anthropic Yes (if using Anthropic)
GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI or GEMINI_API_KEY Google Yes (if using Google)
OLLAMA_API_KEY Ollama No (optional)

Note: For custom providers, use the environment variable names specified in ProviderRegistry.configure() via envVarNames.

Optional Configuration

Variable Default Description
PREFERRED_AI_SERVICE "anthropic" Default AI service
PREFERRED_AI_MODEL "claude-3-5-sonnet-20240620" Default model
AI_TEMPERATURE 0 Default temperature
MAX_TOKENS 2048 Default max tokens
LLM_TIMEOUT 60000 Default timeout (ms)
MAX_INPUT_TOKENS 100000 Default max input tokens
AI_TOP_P 0.95 Default top-p value
OLLAMA_API_URL "http://localhost:11434/api/generate" Ollama API URL
STORE_AI_API_CALLS undefined Set to "true" to store API calls (OpenAI)

API Response Formats

OpenAI Response

{
  "id": "chatcmpl-123456",
  "object": "chat.completion",
  "created": 1728933352,
  "model": "gpt-4o-2024-08-06",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Response text",
      "tool_calls": []
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 10,
    "total_tokens": 29
  }
}

Anthropic Response

{
  "id": "msg_123",
  "type": "message",
  "role": "assistant",
  "content": [{
    "type": "text",
    "text": "Response text"
  }],
  "model": "claude-3-5-sonnet-20240620",
  "usage": {
    "input_tokens": 19,
    "output_tokens": 10
  }
}

Gemini Response (OpenAI-Compatible)

Same format as OpenAI response.

Ollama Response

{
  "model": "llama3.1:8b",
  "created_at": "2024-01-01T00:00:00.000Z",
  "response": "Response text",
  "done": true,
  "context": [],
  "total_duration": 1000,
  "load_duration": 500,
  "prompt_eval_count": 10,
  "prompt_eval_duration": 200,
  "eval_count": 20,
  "eval_duration": 300
}

Supported Models

Default Models

Each service has a default model configured. Use ProviderRegistry.getDefaultModels() to get all default models:

  • Anthropic: claude-3-5-sonnet-20240620
  • OpenAI: gpt-5-nano
  • Google: gemini-2.0-flash
  • Ollama: llama3.1:8b

Reasoning Models

Models starting with "o" (e.g., "o1", "o3") or "gpt-5" are treated as reasoning models and use different parameters:

  • max_completion_tokens instead of max_tokens
  • reasoning_effort parameter ("low", "medium", "high", defaults to "medium")
  • No temperature or top_p parameters

Rate Limiting Behavior

Token Bucket Algorithm

The library uses a token bucket algorithm with two buckets:

  1. Request Bucket: Limits requests per minute
  2. LLM Token Bucket: Limits LLM tokens per minute

Dynamic Updates

Rate limits can be updated dynamically from API response headers:

  • retry-after header is respected
  • Rate limit information from responses updates buckets automatically
  • onRateLimitUpdate callback is invoked when limits change

Circuit Breaker Integration

  • Each retry attempt counts as a separate failure
  • Circuit opens after configured failure threshold
  • Cooldown period prevents immediate retries
  • Success resets the failure count

Caching

Cache Store

Provide a cache store object in constructor options:

const cacheStore = {};
const llm = new ResilientLLM({ cacheStore });

Cache Key Generation

Cache keys are SHA-256 hashes of:

  • API URL
  • Request body (JSON stringified)
  • Headers (JSON stringified)

Cache Behavior

  • Only successful responses (status 200) are cached
  • Cache is checked before making HTTP requests
  • Cache hits return immediately without API call

AbortController Support

Cancellation

Use abort() method to cancel all ongoing operations:

const llm = new ResilientLLM({ /* ... */ });
const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the request

Timeout

Timeouts are enforced using AbortController:

  • Timeout applies to entire operation (including retries)
  • On timeout, AbortController aborts the HTTP request
  • chat() rejects with ResilientLLMError; the original timeout is typically on error.cause (name may be TimeoutError)

Service-Specific Notes

Provider Management

All providers are managed through ProviderRegistry. The implementation uses:

  • ProviderRegistry.get(providerName) - Get provider configuration
  • ProviderRegistry.getChatApiUrl(providerName) - Get chat API URL
  • ProviderRegistry.getChatConfig(providerName) - Get chat configuration
  • ProviderRegistry.buildApiUrl(providerName, url) - Build API URL with query params if needed
  • ProviderRegistry.buildAuthHeaders(providerName, apiKey, defaultHeaders) - Build authentication headers
  • ProviderRegistry.hasApiKey(providerName) - Check if API key is available

See Custom Provider Guide for details on configuring providers.

Anthropic

  • System messages are extracted and sent separately
  • Tool definitions use input_schema instead of parameters
  • API version header: anthropic-version: 2023-06-01
  • Uses x-api-key header instead of Authorization

OpenAI

  • Supports function calling with tools parameter
  • Supports response_format for JSON mode
  • Uses standard Authorization: Bearer <token> header
  • Can store API calls if STORE_AI_API_CALLS=true

Google

  • Uses OpenAI-compatible endpoint
  • Same format as OpenAI for requests/responses
  • Requires GEMINI_API_KEY environment variable
  • Authentication: Uses header authentication (Authorization: Bearer {key}) for chat endpoints, query parameter authentication (?key=...) for models endpoint

Ollama

  • Defaults to http://localhost:11434/api/generate
  • Can override with OLLAMA_API_URL environment variable
  • API key is optional
  • Uses different response format