Complete technical reference for the ResilientLLM library API.
A unified interface for interacting with multiple LLM providers (OpenAI, Anthropic, Google/Gemini, Ollama) with built-in resilience features including rate limiting, retries, circuit breakers, and error handling.
Creates a new ResilientLLM instance.
Signature:
new ResilientLLM(options?: ResilientLLMOptions)Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
options |
ResilientLLMOptions |
No | {} |
Configuration options for the ResilientLLM instance |
ResilientLLMOptions:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
aiService |
string |
No | process.env.PREFERRED_AI_SERVICE or "anthropic" |
AI service provider: "openai", "anthropic", "google", or "ollama" |
model |
string |
No | process.env.PREFERRED_AI_MODEL or "claude-3-5-sonnet-20240620" |
Model identifier for the selected AI service |
temperature |
number |
No | process.env.AI_TEMPERATURE or 0 |
Temperature parameter (0-2) controlling randomness in responses |
maxTokens |
number |
No | process.env.MAX_TOKENS or 2048 |
Maximum number of tokens in the response |
timeout |
number |
No | process.env.LLM_TIMEOUT or 60000 |
Request timeout in milliseconds |
cacheStore |
Object |
No | {} |
Cache store object for storing successful responses |
maxInputTokens |
number |
No | process.env.MAX_INPUT_TOKENS or 100000 |
Maximum number of input tokens allowed |
topP |
number |
No | process.env.AI_TOP_P or 0.95 |
Top-p sampling parameter (0-1) |
rateLimitConfig |
RateLimitConfig |
No | { requestsPerMinute: 10, llmTokensPerMinute: 150000 } |
Rate limiting configuration |
retries |
number |
No | 3 |
Number of retry attempts for failed requests |
backoffFactor |
number |
No | 2 |
Exponential backoff multiplier between retries |
onRateLimitUpdate |
Function |
No | undefined |
Callback function called when rate limit information is updated |
onError |
Function |
No | undefined |
Currently not used (reserved for future use) |
RateLimitConfig:
| Property | Type | Description |
|---|---|---|
requestsPerMinute |
number |
Maximum number of requests allowed per minute |
llmTokensPerMinute |
number |
Maximum number of LLM tokens allowed per minute |
Returns: ResilientLLM instance
Example:
const llm = new ResilientLLM({
aiService: 'openai',
model: 'gpt-5-nano',
maxTokens: 2048,
temperature: 0.7,
rateLimitConfig: {
requestsPerMinute: 60,
llmTokensPerMinute: 90000
}
});Sends a chat completion request to the configured LLM provider.
Signature:
chat(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
conversationHistory |
Message[] |
Yes | Array of message objects representing the conversation history |
llmOptions |
ChatOptions |
No | Override options for this specific request |
Message:
| Property | Type | Required | Description |
|---|---|---|---|
role |
string |
Yes | Message role: "system", "user", "assistant", or "tool" |
content |
string |
Yes | Message content |
ChatOptions:
| Property | Type | Description |
|---|---|---|
aiService |
string |
Override AI service for this request |
model |
string |
Override model for this request |
maxTokens |
number |
Override max tokens for this request |
temperature |
number |
Override temperature for this request |
topP |
number |
Override top-p for this request |
maxInputTokens |
number |
Override max input tokens for this request |
maxCompletionTokens |
number |
Maximum completion tokens (for reasoning models) |
reasoningEffort |
string |
Reasoning effort level: "low", "medium", or "high" (for reasoning models) |
apiKey |
string |
Override API key for this request (takes precedence over ProviderRegistry) |
tools |
Tool[] |
Array of tool definitions for function calling |
responseFormat |
Object | string |
Response format specification (json_object/json_schema object shapes, plain schema-like object, or JSON aliases: "json", "object", "json_object") |
outputConfig |
Object |
Legacy/migration support. Anthropic-style alternative structured-output input shape, normalized internally via responseFormat. Prefer responseFormat for all new usage. |
response_format |
Object | string |
Legacy/migration support. Snake_case alias for responseFormat; passthrough-friendly for provider-native payloads. Prefer responseFormat for all new usage. |
output_config |
Object |
Legacy/migration support. Snake_case alias for outputConfig; passed through as-is when provided. Prefer responseFormat for all new usage. |
Use one naming style per field to avoid ambiguity:
- Prefer camelCase (
responseFormator its aliasoutputConfig) in app code. - Prefer snake_case (
response_format,output_config) when reusing raw provider payload snippets. - Do not send both aliases for the same field in one request; conflicting info may result in error.
Tool:
| Property | Type | Description |
|---|---|---|
type |
string |
Tool type, typically "function" |
function |
Object |
Function definition |
function.name |
string |
Function name |
function.description |
string |
Function description |
function.parameters |
Object |
Function parameters schema (OpenAI format) |
function.input_schema |
Object |
Function input schema (Anthropic format) |
Returns: Promise<ChatResponse>
- Always returns a predictable envelope:
response.contentis the assistant output (string in text mode, parsed object in JSON/schema mode)response.toolCallsis included when tool calls are returned
response.metadatais always included
ChatResponse:
| Property | Type | Description |
|---|---|---|
content |
string | Object | null |
The assistant content (text by default, normalized JSON object in JSON modes) |
toolCalls |
Array |
Array of tool call objects (if tools were used) |
metadata |
OperationMetadata |
Always included (request id, config, timing, retries, rate limiting, usage, etc.) |
Throws:
ResilientLLMError— Normalized failures fromchat()(after internal retries when applicable). Useerror.code(ResilientLLMErrorCode),error.retryable,error.metadata, anderror.cause(log server-side). The canonical code list is inlib/ResilientLLMError.ts.- Structured output failures use codes such as
JSON_PARSE_ERROR,JSON_MODE_FAILURE,SCHEMA_MISMATCH, orVALIDATION_ERROR; details may appear onerror.cause.
Notes:
- API keys can be provided via
llmOptions.apiKey,ProviderRegistry.configure(), or environment variables - The implementation uses
ProviderRegistryto manage providers and their configurations - Response parsing is handled generically using provider-specific
chatConfigsettings - For schema mode, validation checks top-level required fields and primitive types (
string,number,boolean,integer). Schema mismatch errors include avalidationobject withmissingFields,extraFields, andtypeMismatchesarrays
Example:
const conversationHistory = [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
];
const { content } = await llm.chat(conversationHistory);
console.log(content); // "The capital of France is Paris."Example with tools:
const response = await llm.chat(conversationHistory, {
tools: [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get the weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string' }
}
}
}
}]
});
// response: { content: null, toolCalls: [...] }Example with API key override:
// Override API key for this specific request
const response = await llm.chat(conversationHistory, {
apiKey: 'sk-custom-key-here',
aiService: 'openai',
model: 'gpt-5-nano'
});Example with operation metadata:
const llm = new ResilientLLM({
aiService: 'openai',
model: 'gpt-5-nano',
});
const { content, metadata } = await llm.chat(conversationHistory);
console.log(content); // Assistant reply text
console.log(metadata?.requestId);
console.log(metadata?.timing?.totalTimeMs);
console.log(metadata?.usage); // prompt_tokens, completion_tokens, total_tokensCancels all ongoing LLM operations for this instance.
Signature:
abort(): voidReturns: void
Description:
- Aborts all active HTTP requests initiated by this
ResilientLLMinstance - Clears all resilient operation instances
- Resets the internal abort controller
Example:
const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the ongoing requestNote: For API URLs and key checks, import ProviderRegistry: use ProviderRegistry.getChatApiUrl(providerName) and ProviderRegistry.buildApiUrl(providerName, baseUrl, null) for URLs; use ProviderRegistry.hasApiKey(providerName) to check if a key is present (keys are not exposed). See Custom Provider Guide for details.
Converts a messages array to the format required by Anthropic's API.
Signature:
formatMessageForAnthropic(messages: Message[]): { system?: string, messages: Message[] }Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
messages |
Message[] |
Yes | Array of message objects |
Returns: Object with properties:
system-string | undefined- System message content if presentmessages-Message[]- Messages array without system messages
Description:
- Extracts system messages from the messages array
- Returns system content separately and remaining messages without system role
Example:
const messages = [
{ role: 'system', content: 'You are helpful.' },
{ role: 'user', content: 'Hello!' }
];
const { system, messages } = llm.formatMessageForAnthropic(messages);
// system: "You are helpful."
// messages: [{ role: 'user', content: 'Hello!' }]Normalizes an error into ResilientLLMError. Used internally when chat() fails; you can call it directly if you need the same mapping (e.g. tests).
Signature:
parseError(statusCode: number | null, error: Error, operationMetadata?: OperationMetadata | null): neverParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
statusCode |
number | null |
Yes | Provider HTTP status when known, or null |
error |
Error |
Yes | Underlying error |
operationMetadata |
OperationMetadata | null |
No | Merged onto the thrown error’s metadata |
Returns: never — always throws ResilientLLMError.
If error is already a ResilientLLMError, it is rethrown (metadata may be merged). Otherwise statusCode selects a PROVIDER_* code (e.g. 401 → PROVIDER_UNAUTHORIZED); null or unknown statuses map to PROVIDER_ERROR. See lib/ResilientLLMError.ts for the full ResilientLLMErrorCode union.
Generic method to parse chat completion response using provider configuration. This is the preferred method used internally.
Signature:
parseChatCompletion(data: Object, chatConfig: Object, tools?: Tool[]): string | ChatResponseParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
data |
Object |
Yes | API response object |
chatConfig |
Object |
Yes | Chat configuration from provider (contains responseParsePath) |
tools |
Tool[] |
No | Tools array if function calling was used |
Returns: string | ChatResponse
- If
toolsprovided and tool calls found: ReturnsChatResponsewithcontentandtoolCalls - Otherwise: Returns
stringcontent
chatConfig.responseParsePath:
- Path to extract content from response (e.g.,
'choices[0].message.content','content[0].text','response') - Supports dot notation and bracket notation for nested values
Example:
const chatConfig = {
responseParsePath: 'choices[0].message.content',
toolSchemaType: 'openai'
};
const data = {
choices: [{
message: {
content: "Hello!",
tool_calls: []
}
}]
};
const content = llm.parseChatCompletion(data, chatConfig);
// "Hello!"Parses OpenAI chat completion response.
Signature:
parseOpenAIChatCompletion(data: Object, tools?: Tool[]): string | ChatResponseStatus: parseChatCompletion() with chatConfig instead.
Parses Anthropic chat completion response.
Signature:
parseAnthropicChatCompletion(data: Object, tools?: Tool[]): stringStatus: parseChatCompletion() with chatConfig instead.
Parses Ollama chat completion response.
Signature:
parseOllamaChatCompletion(data: Object, tools?: Tool[]): stringStatus: parseChatCompletion() with chatConfig instead.
Parses Google chat completion response (OpenAI-compatible endpoint).
Signature:
parseGoogleChatCompletion(data: Object, tools?: Tool[]): stringStatus: parseChatCompletion() with chatConfig instead.
Retries the chat request with an alternate AI service when the current service returns rate limit errors (429, 529).
Signature:
retryChatWithAlternateService(conversationHistory: Message[], llmOptions?: ChatOptions): Promise<ChatResponse>Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
conversationHistory |
Message[] |
Yes | Array of message objects |
llmOptions |
ChatOptions |
No | LLM options for the request |
Returns: Promise<ChatResponse> - Response from the alternate service
Throws:
Error- If no alternative service is available
Description:
- Automatically switches to the next available service from
ProviderRegistry.getDefaultModels() - Skips services that have already failed
- Uses default model for each service
Example:
// Automatically called internally when rate limit errors occur
// Can also be called manually if needed
const response = await llm.retryChatWithAlternateService(conversationHistory);Estimates the number of tokens in a given text string.
Signature:
static estimateTokens(text: string): numberParameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string |
Yes | Text to estimate tokens for |
Returns: number - Estimated token count
Description:
- For texts longer than 10,000 characters: Uses approximation (~4 characters per token)
- For shorter texts: Uses accurate tokenization with Tiktoken encoder (o200k_base encoding)
- Uses lazy initialization of the encoder
Example:
const tokenCount = ResilientLLM.estimateTokens("Hello, world!");
// Returns estimated token countRepresents a single message in a conversation.
interface Message {
role: 'system' | 'user' | 'assistant' | 'tool';
content: string;
}Response envelope returned by chat() on every call.
contentis the assistant output:- text mode ->
string - JSON/schema mode -> parsed JS object
- text mode ->
toolCallsis present when tool calls were returnedmetadatais always included
interface ChatResponse {
content: string | Object | null;
toolCalls?: Array<any>;
metadata: OperationMetadata;
}Operation metadata attached to ChatResponse.metadata on every call. Used for observability, logging, and debugging.
interface OperationMetadata {
requestId: string;
operationId: string;
startTime: number;
finishReason?: string | null;
config: {
aiService: string;
model: string;
temperature: number | null;
maxTokens: number | null;
topP: number | null;
maxInputTokens: number;
estimatedInputTokens: number;
enableCache: boolean;
// ... resilience config (retries, rateLimitConfig, etc.)
};
events: Array<any>;
timing: {
totalTimeMs: number | null;
rateLimitWaitMs: number;
httpRequestMs: number | null;
};
retries: Array<any>;
rateLimiting: { requestedTokens: number; totalWaitMs: number; [key: string]: any };
circuitBreaker: Object;
http: {
url: string;
method: string;
statusCode: number | null;
headers: Record<string, string>;
durationMs?: number;
error?: string;
};
cache: { enabled: boolean; [key: string]: any };
service: { attempted: string[]; final: string };
usage?: {
prompt_tokens: number | null;
completion_tokens: number | null;
total_tokens: number | null;
};
}Configuration for rate limiting.
interface RateLimitConfig {
requestsPerMinute: number;
llmTokensPerMinute: number;
}Constructor options for ResilientLLM.
interface ResilientLLMOptions {
aiService?: string;
model?: string;
temperature?: number;
maxTokens?: number;
timeout?: number;
cacheStore?: Object;
maxInputTokens?: number;
topP?: number;
rateLimitConfig?: RateLimitConfig;
retries?: number;
backoffFactor?: number;
onRateLimitUpdate?: (info: RateLimitInfo) => void;
onError?: (error: Error) => void;
}Options for individual chat requests.
interface ChatOptions {
aiService?: string;
model?: string;
maxTokens?: number;
temperature?: number;
topP?: number;
maxInputTokens?: number;
maxCompletionTokens?: number;
reasoningEffort?: 'low' | 'medium' | 'high';
apiKey?: string;
tools?: Tool[];
responseFormat?: Object;
outputConfig?: Object;
}Use responseFormat when you need the assistant response as JSON, optionally matching a particular schema.
- JSON mode (no schema): ensures the reply is a single JSON object (library parses it for you).
- Schema mode: provides a JSON Schema so the library can validate the parsed object and throw
SCHEMA_MISMATCHwhen required keys/types don’t match.
Supplying a schema
You can supply a schema in any of these equivalent shapes (pick one and stick to it):
- OpenAI-style wrapper (recommended when you want to be explicit):
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'my_payload',
schema: {
type: 'object',
properties: {
answer: { type: 'string' },
citations: { type: 'array', items: { type: 'string' } }
},
required: ['answer']
}
}
}- Short wrapper (schema directly on the object):
responseFormat: {
type: 'json_schema',
schema: {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer']
}
}- Plain schema-like object (auto-detected as a schema):
responseFormat: {
type: 'object',
properties: { answer: { type: 'string' } },
required: ['answer']
}End-to-end example (schema mode)
const llm = new ResilientLLM({ aiService: 'openai', model: 'gpt-5-nano' });
const result = await llm.chat(
[{ role: 'user', content: 'Return an answer and citations.' }],
{
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'answer_payload',
schema: {
type: 'object',
properties: {
answer: { type: 'string' },
citations: { type: 'array', items: { type: 'string' } }
},
required: ['answer']
}
}
}
}
);
// `result.content` is a parsed JS object when `responseFormat` requests JSON/schema mode.Validation scope (important)
The built-in validator is intentionally lightweight: it checks required keys, extra keys, and primitive types at the top level (string, number, boolean, integer).
- Extra keys are enforced only when your schema sets
additionalProperties: false(and the schema hasproperties). - For deeper validation needs (nested objects, enums, regex, oneOf/anyOf, etc.), run your own schema validator after the call.
Example: additionalProperties: false + required
const result = await llm.chat(messages, {
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'answer_payload',
schema: {
type: 'object',
additionalProperties: false,
properties: {
answer: { type: 'string' }
},
required: ['answer']
}
}
}
});
// `result.content` is { answer: string } when the model output matches the schema.
// If the model returns invalid JSON or extra keys, `llm.chat(...)` throws StructuredOutputError (e.g. `SCHEMA_MISMATCH`).// JSON alias strings (equivalent to { type: 'json_object' })
'json'
'object'
'json_object'
// OpenAI-compatible JSON mode
{ type: 'json_object' }
// When `responseFormat` requests JSON, `llm.chat(...)` resolves to a response envelope
// where `.content` is the parsed JS object.Tool definition for function calling.
interface Tool {
type: string;
function: {
name: string;
description: string;
parameters?: Object; // OpenAI format
input_schema?: Object; // Anthropic format
};
}Failures from chat() are thrown as ResilientLLMError (see chat() Throws above). That type is the consumer-facing surface: code, retryable, optional metadata (same shape as success), and cause for logging.
Stable string codes — ResilientLLMErrorCode in lib/ResilientLLMError.ts (including PROVIDER_*, structured-output codes, resilience-related codes, and configuration/capability codes). retryable is defined there for codes where a simple retry might help.
Use error.code for branching, not raw HTTP status. When a provider HTTP status was available to the library, it may also appear under metadata (e.g. provider.httpStatus / http).
API keys are required for most LLM providers. They can be provided in three ways (in order of precedence):
- Per-request via
llmOptions.apiKey(highest priority) - Via
ProviderRegistry.configure()with directapiKeyparameter - Via environment variables (lowest priority)
For advanced use cases (custom providers, multiple API keys, or programmatic configuration), see the Custom Provider Guide - Authentication Configuration.
Set at least one API key for your chosen service:
| Variable | Service | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI | Yes (if using OpenAI) |
ANTHROPIC_API_KEY |
Anthropic | Yes (if using Anthropic) |
GOOGLE_API_KEY or GOOGLE_GENERATIVE_AI or GEMINI_API_KEY |
Yes (if using Google) | |
OLLAMA_API_KEY |
Ollama | No (optional) |
Note: For custom providers, use the environment variable names specified in ProviderRegistry.configure() via envVarNames.
| Variable | Default | Description |
|---|---|---|
PREFERRED_AI_SERVICE |
"anthropic" |
Default AI service |
PREFERRED_AI_MODEL |
"claude-3-5-sonnet-20240620" |
Default model |
AI_TEMPERATURE |
0 |
Default temperature |
MAX_TOKENS |
2048 |
Default max tokens |
LLM_TIMEOUT |
60000 |
Default timeout (ms) |
MAX_INPUT_TOKENS |
100000 |
Default max input tokens |
AI_TOP_P |
0.95 |
Default top-p value |
OLLAMA_API_URL |
"http://localhost:11434/api/generate" |
Ollama API URL |
STORE_AI_API_CALLS |
undefined |
Set to "true" to store API calls (OpenAI) |
{
"id": "chatcmpl-123456",
"object": "chat.completion",
"created": 1728933352,
"model": "gpt-4o-2024-08-06",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text",
"tool_calls": []
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 19,
"completion_tokens": 10,
"total_tokens": 29
}
}{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [{
"type": "text",
"text": "Response text"
}],
"model": "claude-3-5-sonnet-20240620",
"usage": {
"input_tokens": 19,
"output_tokens": 10
}
}Same format as OpenAI response.
{
"model": "llama3.1:8b",
"created_at": "2024-01-01T00:00:00.000Z",
"response": "Response text",
"done": true,
"context": [],
"total_duration": 1000,
"load_duration": 500,
"prompt_eval_count": 10,
"prompt_eval_duration": 200,
"eval_count": 20,
"eval_duration": 300
}Each service has a default model configured. Use ProviderRegistry.getDefaultModels() to get all default models:
- Anthropic:
claude-3-5-sonnet-20240620 - OpenAI:
gpt-5-nano - Google:
gemini-2.0-flash - Ollama:
llama3.1:8b
Models starting with "o" (e.g., "o1", "o3") or "gpt-5" are treated as reasoning models and use different parameters:
max_completion_tokensinstead ofmax_tokensreasoning_effortparameter ("low","medium","high", defaults to"medium")- No
temperatureortop_pparameters
The library uses a token bucket algorithm with two buckets:
- Request Bucket: Limits requests per minute
- LLM Token Bucket: Limits LLM tokens per minute
Rate limits can be updated dynamically from API response headers:
retry-afterheader is respected- Rate limit information from responses updates buckets automatically
onRateLimitUpdatecallback is invoked when limits change
- Each retry attempt counts as a separate failure
- Circuit opens after configured failure threshold
- Cooldown period prevents immediate retries
- Success resets the failure count
Provide a cache store object in constructor options:
const cacheStore = {};
const llm = new ResilientLLM({ cacheStore });Cache keys are SHA-256 hashes of:
- API URL
- Request body (JSON stringified)
- Headers (JSON stringified)
- Only successful responses (status 200) are cached
- Cache is checked before making HTTP requests
- Cache hits return immediately without API call
Use abort() method to cancel all ongoing operations:
const llm = new ResilientLLM({ /* ... */ });
const promise = llm.chat(conversationHistory);
llm.abort(); // Cancels the requestTimeouts are enforced using AbortController:
- Timeout applies to entire operation (including retries)
- On timeout,
AbortControlleraborts the HTTP request chat()rejects withResilientLLMError; the original timeout is typically onerror.cause(namemay beTimeoutError)
All providers are managed through ProviderRegistry. The implementation uses:
ProviderRegistry.get(providerName)- Get provider configurationProviderRegistry.getChatApiUrl(providerName)- Get chat API URLProviderRegistry.getChatConfig(providerName)- Get chat configurationProviderRegistry.buildApiUrl(providerName, url)- Build API URL with query params if neededProviderRegistry.buildAuthHeaders(providerName, apiKey, defaultHeaders)- Build authentication headersProviderRegistry.hasApiKey(providerName)- Check if API key is available
See Custom Provider Guide for details on configuring providers.
- System messages are extracted and sent separately
- Tool definitions use
input_schemainstead ofparameters - API version header:
anthropic-version: 2023-06-01 - Uses
x-api-keyheader instead ofAuthorization
- Supports function calling with
toolsparameter - Supports
response_formatfor JSON mode - Uses standard
Authorization: Bearer <token>header - Can store API calls if
STORE_AI_API_CALLS=true
- Uses OpenAI-compatible endpoint
- Same format as OpenAI for requests/responses
- Requires
GEMINI_API_KEYenvironment variable - Authentication: Uses header authentication (
Authorization: Bearer {key}) for chat endpoints, query parameter authentication (?key=...) for models endpoint
- Defaults to
http://localhost:11434/api/generate - Can override with
OLLAMA_API_URLenvironment variable - API key is optional
- Uses different response format