ReqLLM uses a comprehensive fixture-based testing system to ensure reliability across all supported models and providers. This guide explains how we validate "Supported Models" and the testing infrastructure.
The testing system validates models through the mix req_llm.model_compat task, which runs capability-focused tests against models selected from the registry.
# Validate all models with passing fixtures (fastest)
mix req_llm.model_compat
# Alias
mix mcThis runs tests using cached fixtures - no API calls are made. It validates models that have previously passing test results stored in priv/supported_models.json.
# Validate all Anthropic models
mix mc anthropic
# Validate specific model
mix mc "openai:gpt-4o"
# Validate all models for a provider
mix mc "xai:*"
# List all available models from registry
mix mc --availableTo test against live APIs and (re)generate fixtures:
# Re-record fixtures for xAI models
mix mc "xai:*" --record
# Re-record all models (not recommended, expensive)
mix mc "*:*" --record# Test sample models per provider (uses config/config.exs sample list)
mix mc --sample
# Test specific provider samples
mix mc --sample anthropicModel metadata lives in priv/models_dev/*.json files, automatically synced from models.dev via mix req_llm.model_sync.
Each model entry includes:
- Capabilities (
tool_call,reasoning,attachment,temperature) - Modalities (
input: [:text, :image],output: [:text]) - Limits (
context,outputtoken limits) - Costs (
input,outputper 1M tokens) - API-specific metadata
The priv/supported_models.json file tracks which models have passing fixtures. This file is auto-generated and should not be manually edited.
Tests use the ReqLLM.ProviderTest.Comprehensive macro (in test/support/provider_test/comprehensive.ex), which generates up to 9 focused tests per model based on capabilities:
- Basic generate_text (non-streaming) - All models
- Streaming with system context + creative params - Models with streaming support
- Token limit constraints - All models
- Usage metrics and cost calculations - All models
- Tool calling - multi-tool selection - Models with
:tool_callcapability - Tool calling - no tool when inappropriate - Models with
:tool_callcapability - Object generation (non-streaming) - Models with object generation support
- Object generation (streaming) - Models with object generation support
- Reasoning/thinking tokens - Models with
:reasoningcapability
test/coverage/
├── anthropic/
│ └── comprehensive_test.exs
├── openai/
│ └── comprehensive_test.exs
├── google/
│ └── comprehensive_test.exs
└── ...
Each provider has a single comprehensive test file:
defmodule ReqLLM.Coverage.Anthropic.ComprehensiveTest do
use ReqLLM.ProviderTest.Comprehensive, provider: :anthropic
endThe macro automatically:
- Selects models from
ModelMatrixbased on provider and operation type - Generates tests for each model based on capabilities
- Handles fixture recording and replay
- Tags tests with provider, model, and scenario
A model is considered "supported" when it:
- Has metadata in
priv/models_dev/<provider>.json - Passes comprehensive tests for its advertised capabilities
- Has fixture evidence stored for validation
The count you see in documentation ("135+ models currently pass our comprehensive fixture-based test suite") comes from models in priv/supported_models.json.
Tests use structured tags for precise filtering:
@moduletag :coverage # All coverage tests
@moduletag provider: "anthropic" # Provider filter
@describetag model: "claude-3-5-sonnet" # Model filter (without provider prefix)
@tag scenario: :basic # Scenario filterRun specific subsets:
# All coverage tests
mix test --only coverage
# Specific provider
mix test --only "provider:anthropic"
# Specific scenario
mix test --only "scenario:basic"
mix test --only "scenario:streaming"
mix test --only "scenario:tool_multi"
# Specific model
mix test --only "model:claude-3-5-haiku-20241022"
# Combine filters
mix test --only "provider:openai" --only "scenario:basic"# Use cached fixtures (default, no API calls)
mix mc
# Record new fixtures (makes live API calls)
REQ_LLM_FIXTURES_MODE=record mix mc
# OR
mix mc --record# Test all available models
REQ_LLM_MODELS="all" mix mc
# Test all models from a provider
REQ_LLM_MODELS="anthropic:*" mix mc
# Test specific models (comma-separated)
REQ_LLM_MODELS="openai:gpt-4o,anthropic:claude-3-5-sonnet" mix mc
# Sample N models per provider
REQ_LLM_SAMPLE=2 mix mc
# Exclude specific models
REQ_LLM_EXCLUDE="gpt-4o-mini,gpt-3.5-turbo" mix mc# Verbose fixture debugging
REQ_LLM_DEBUG=1 mix mcFixtures are stored next to test files:
test/coverage/<provider>/fixtures/
├── basic.json
├── streaming.json
├── token_limit.json
├── usage.json
├── tool_multi.json
├── no_tool.json
├── object_basic.json
├── object_streaming.json
└── reasoning_basic.json
Fixtures capture the complete API response:
{
"captured_at": "2025-01-15T10:30:00Z",
"model_spec": "anthropic:claude-3-5-sonnet-20241022",
"scenario": "basic",
"result": {
"ok": true,
"response": {
"id": "msg_123",
"model": "claude-3-5-sonnet-20241022",
"message": {...},
"usage": {...}
}
}
}The fixture system supports parallel test execution:
- Tests run concurrently for speed
- State tracking skips models with passing fixtures
- Use
--recordor--record-allto regenerate
- Implement provider module and metadata
- Create test file using
Comprehensivemacro - Record initial fixtures:
mix mc "<provider>:*" --record - Verify all tests pass:
mix mc "<provider>"
- Sync latest model metadata:
mix req_llm.model_sync
- Record fixtures for new models:
mix mc "<provider>:new-model" --record - Validate updated coverage:
mix mc "<provider>"
Periodically refresh fixtures to catch API changes:
# Refresh specific provider
mix mc "anthropic:*" --record
# Refresh specific capability
REQ_LLM_FIXTURES_MODE=record mix test --only "scenario:streaming"
# Refresh all (expensive, requires all API keys)
mix mc "*:*" --recordWe guarantee that all "supported models" (those counted in our documentation):
- Have passing fixtures for basic functionality
- Are tested against live APIs before fixture capture
- Pass capability-focused tests for advertised features
- Are regularly refreshed to catch provider-side changes
For each supported model:
- ✅ Text generation (streaming and non-streaming)
- ✅ Token limits and truncation behavior
- ✅ Usage metrics and cost calculation
- ✅ Tool calling (if advertised)
- ✅ Object generation (if advertised)
- ✅ Reasoning tokens (if advertised)
- Complex edge cases beyond basic capabilities
- Provider-specific features not in model metadata
- Real-time behavior (fixtures may be cached)
- Exact API response formats (providers may change)
If tests fail with fixture mismatches:
# Re-record the specific scenario
mix mc "provider:model" --recordTests skip if API key is unavailable:
# Set in .env file
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...Enable verbose output:
REQ_LLM_DEBUG=1 mix test --only "provider:anthropic" --only "scenario:basic"- Run locally before CI:
mix mcbefore committing - Record incrementally: Don't re-record all fixtures at once
- Use samples for development:
mix mc --samplefor quick validation - Keep fixtures fresh: Refresh fixtures when providers update APIs
- Tag tests appropriately: Use semantic tags for precise test selection
# Validation (using fixtures)
mix mc # All models with passing fixtures
mix mc anthropic # All Anthropic models
mix mc "openai:gpt-4o" # Specific model
mix mc --sample # Sample models per provider
mix mc --available # List all registry models
# Recording (live API calls)
mix mc --record # Re-record passing models
mix mc "xai:*" --record # Re-record xAI models
mix mc "<provider>:*" --record # Re-record specific provider
# Environment variables
REQ_LLM_FIXTURES_MODE=record # Force recording
REQ_LLM_MODELS="pattern" # Model selection pattern
REQ_LLM_SAMPLE=N # Sample N per provider
REQ_LLM_EXCLUDE="model1,model2" # Exclude models
REQ_LLM_DEBUG=1 # Verbose outputThe fixture-based testing system provides:
- Fast local validation with cached fixtures
- Comprehensive coverage across capabilities
- Parallel execution for speed
- Clear model support guarantees backed by test evidence
- Easy provider addition with minimal boilerplate
This system is how ReqLLM backs up the claim of "135+ supported models" - each one has fixture evidence of passing comprehensive capability tests.