AGENTS.md — Bifrost AI Gateway

Context for AI agents (Claude Code, Copilot, Cursor, etc.) working on this codebase. Read this fully before making changes.

What is Bifrost?

Bifrost is a high-performance AI gateway that unifies 20+ LLM providers behind a single OpenAI-compatible API with ~11µs overhead at 5,000 RPS. It also serves as an MCP (Model Context Protocol) gateway, turning static chat models into tool-calling agents.

GitHub: maximhq/bifrost

Repository Layout

bifrost/
├── core/                           # Go core library — the engine
│   ├── bifrost.go                  # Main struct, request queuing, provider lifecycle (~3.4K lines)
│   ├── inference.go                # Inference routing, fallbacks, streaming dispatch (~1.9K lines)
│   ├── mcp.go                     # MCP integration entry point
│   ├── schemas/                   # ALL shared Go types — 41 files
│   │   ├── bifrost.go             # BifrostConfig, ModelProvider enum, RequestType enum, context keys
│   │   ├── provider.go            # Provider interface (30+ methods), NetworkConfig, ProviderConfig
│   │   ├── plugin.go              # LLMPlugin, MCPPlugin, HTTPTransportPlugin, ObservabilityPlugin
│   │   ├── context.go             # BifrostContext (custom context.Context with mutable values)
│   │   ├── chatcompletions.go     # Chat completion request/response types
│   │   ├── responses.go           # OpenAI Responses API types
│   │   ├── embedding.go           # Embedding types
│   │   ├── images.go              # Image generation types
│   │   ├── batch.go               # Batch operation types
│   │   ├── files.go               # File management types
│   │   ├── mcp.go                 # MCP types
│   │   ├── trace.go               # Tracer interface
│   │   └── logger.go              # Logger interface
│   ├── providers/                 # 20+ provider implementations
│   │   ├── openai/                # Reference implementation (largest, most complete)
│   │   ├── anthropic/             # Non-OpenAI-compatible example
│   │   ├── bedrock/               # AWS event-stream protocol
│   │   ├── gemini/                # Google-specific API shape
│   │   ├── groq/                  # OpenAI-compatible (minimal, delegates to openai/)
│   │   └── utils/                 # Shared: HTTP client, SSE parsing, error handling, scanner pool
│   ├── pool/                      # Generic Pool[T] — dual-mode (prod: sync.Pool, debug: full tracking)
│   │   ├── pool_prod.go           # Zero-overhead sync.Pool wrapper (default build)
│   │   └── pool_debug.go          # Double-release/use-after-release/leak detection (-tags pooldebug)
│   ├── mcp/                       # MCP protocol implementation
│   │   ├── agent.go               # Agent orchestration loop (multi-turn tool calling)
│   │   ├── clientmanager.go       # MCP client lifecycle management
│   │   ├── toolmanager.go         # Tool registration, discovery, filtering
│   │   ├── healthmonitor.go       # Client health monitoring
│   │   └── codemode/starlark/     # Starlark sandbox for code-mode execution
│   └── internal/
│       ├── llmtests/              # LLM integration test infra (48 files, scenario-based)
│       └── mcptests/              # MCP/Agent test infra (40+ files, mock-based)
│
├── framework/                     # Data persistence, streaming, ecosystem utilities
│   ├── configstore/               # Config storage backends (file, postgres)
│   ├── logstore/                  # Log storage backends (file, postgres)
│   ├── vectorstore/               # Vector storage (Weaviate, Qdrant, Redis, Pinecone)
│   ├── streaming/                 # Streaming accumulator, delta copying, response marshaling
│   │   ├── accumulator.go         # Chunk accumulation into full response (~24KB)
│   │   ├── chat.go                # Chat stream handling (~17KB)
│   │   └── responses.go           # Response stream marshaling (~35KB)
│   ├── modelcatalog/              # Model metadata registry
│   ├── tracing/                   # Distributed tracing helpers
│   └── encrypt/                   # Encryption utilities
│
├── transports/
│   ├── config.schema.json         # JSON Schema — THE source of truth for config.json (~2700 lines)
│   └── bifrost-http/              # HTTP gateway transport
│       ├── server/                # Server lifecycle, route registration
│       ├── handlers/              # 27 HTTP endpoint handlers
│       │   ├── inference.go       # Chat/text completions, responses API (~109KB)
│       │   ├── mcpinference.go    # MCP tool execution
│       │   ├── governance.go      # Virtual keys, teams, customers, budgets (~100KB)
│       │   ├── providers.go       # Provider CRUD, key management
│       │   ├── mcp.go             # MCP client registry management
│       │   ├── logging.go         # Log queries, stats, histograms
│       │   ├── config.go          # System configuration
│       │   ├── plugins.go         # Plugin CRUD
│       │   ├── cache.go           # Cache management
│       │   ├── session.go         # Auth/session management
│       │   ├── health.go          # Health checks
│       │   ├── mcpserver.go       # MCP server (SSE/streamable HTTP)
│       │   ├── websocket.go       # WebSocket handler
│       │   ├── devpprof.go        # Pool debug profiler endpoint (~23KB)
│       │   └── middlewares.go     # Middleware definitions
│       ├── lib/                   # ChainMiddlewares, config, context conversion
│       └── integrations/          # SDK compatibility layers
│           ├── openai.go          # OpenAI SDK drop-in compatibility
│           ├── anthropic.go       # Anthropic SDK compatibility
│           ├── bedrock.go         # AWS Bedrock SDK compatibility
│           ├── genai.go           # Google GenAI SDK compatibility
│           ├── langchain.go       # LangChain compatibility
│           ├── litellm.go         # LiteLLM compatibility
│           └── pydanticai.go      # PydanticAI compatibility
│
├── plugins/                       # Go plugins — each has own go.mod
│   ├── governance/                # Budget, rate limiting, virtual keys, routing, RBAC
│   ├── telemetry/                 # Prometheus metrics, push gateway
│   ├── logging/                   # Request/response audit logging
│   ├── semanticcache/             # Semantic response caching via vector store
│   ├── otel/                      # OpenTelemetry tracing
│   ├── mocker/                    # Mock responses for testing
│   ├── jsonparser/                # JSON extraction utilities
│   ├── maxim/                     # Maxim observability
│   └── compat/                    # LiteLLM SDK compatibility (HTTP transport)
│
├── ui/                            # React + vite web interface
│   ├── app/workspace/             # Feature pages (20+ workspace sections)
│   ├── components/                # Shared React components
│   └── lib/                       # Constants, utilities, types
│
├── tests/e2e/                     # Playwright E2E tests
│   ├── core/                      # Fixtures, page objects, helpers, API actions
│   └── features/                  # Per-feature test suites
│
├── docs/                          # Mintlify MDX documentation
│   ├── docs.json                  # Navigation config
│   ├── media/                     # Screenshots (ui-*.png naming convention)
│   └── (architecture|features|providers|mcp|plugins|enterprise|...)
│
├── .claude/skills/                # Claude Code skill definitions (4 skills)
├── go.work                        # Go workspace — requires Go 1.26.1
├── Makefile                       # Build, test, dev commands (1300+ lines)
└── terraform/                     # Infrastructure as Code

Go Workspace

Bifrost is a multi-module Go workspace. Each module has its own go.mod:

go.work
├── core/go.mod              # github.com/maximhq/bifrost/core
├── framework/go.mod         # github.com/maximhq/bifrost/framework
├── transports/go.mod        # github.com/maximhq/bifrost/transports
└── plugins/*/go.mod         # 9 plugin modules (governance, telemetry, logging, etc.)

Rules:

Run go mod tidy in the specific module directory, not the root
Cross-module imports resolve via workspace locally, but need explicit require in go.mod for releases
The workspace requires Go 1.26.1 (go.work directive)

Build, Test & Dev Commands

# Development
make dev                                 # Full local dev (UI + API with hot reload via air)
make build                               # Build bifrost-http binary

# Core tests (provider integration tests — hit live APIs)
make test-core                           # All providers
make test-core PROVIDER=openai           # Specific provider
make test-core PROVIDER=openai TESTCASE=TestSimpleChat  # Specific test
make test-core PATTERN=TestStreaming      # Tests matching pattern
make test-core DEBUG=1                   # With Delve debugger on :2345

# MCP/Agent tests (mock-based, no live APIs)
make test-mcp                            # All MCP tests
make test-mcp TESTCASE=TestAgentLoop     # Specific test
make test-mcp TYPE=agent                 # By category (agent|tool|connection|codemode)

# Plugin tests
make test-plugins                        # All plugins
make test-governance                     # Governance plugin specifically

# Integration tests (SDK compatibility)
make test-integrations-py                # Python SDK tests
make test-integrations-ts                # TypeScript SDK tests

# E2E tests (Playwright, requires running dev server)
make run-e2e                             # All E2E tests
make run-e2e FLOW=providers              # Specific feature

# Code quality
make lint                                # Linting
make fmt                                 # Format code

Architecture

Request Flow

Client HTTP Request
  → FastHTTP Transport (parsing, validation ~2µs)
    → SDK Integration Layer (OpenAI/Anthropic/Bedrock format → Bifrost format)
      → Middleware Chain (lib.ChainMiddlewares, applied per-route)
        → HTTPTransportPreHook (HTTP-level plugins, can short-circuit)
          → PreLLMHook Pipeline (auth, rate-limit, cache check — registration order)
            → MCP Tool Discovery & Injection (if tool_choice present)
              → Provider Queue (channel-based, per-provider isolation)
                → Worker picks up request
                  → Key Selection (~10ns weighted random)
                    → Provider API Call (fasthttp client, connection pooling)
                      → Response / SSE Stream
                → PostLLMHook Pipeline (reverse order of PreLLMHooks)
              → Tool Execution Loop (if tool_calls in response, MCP agent loop)
            → HTTPTransportPostHook (reverse order)
          → Response Serialization
        → HTTP Response to Client

Design Principles

Provider isolation: Each provider has its own worker pool and queue. One provider going down doesn't cascade to others.
Channel-based async: Request routing uses Go channels (chan *ChannelMessage), not mutexes. The ProviderQueue struct manages channel lifecycle with atomic flags.
Object pooling everywhere: sync.Pool wrappers reduce GC pressure. Pools exist for: channel messages, response channels, error channels, stream channels, plugin pipelines, MCP requests, HTTP request/response objects, scanner buffers.
Plugin pipeline symmetry: Pre-hooks execute in registration order, post-hooks in reverse order (LIFO). For every pre-hook executed, the corresponding post-hook is guaranteed to run.
Streaming: SSE chunks flow through chan chan *schemas.BifrostStreamChunk. Accumulated into full response for post-hooks via framework/streaming/accumulator.go.

BifrostContext — Custom Context

BifrostContext (core/schemas/context.go) is a custom context.Context with thread-safe mutable values. Unlike standard Go contexts, values can be set after creation:

ctx := schemas.NewBifrostContext(parent, deadline)
ctx.SetValue(key, value)     // Thread-safe, uses RWMutex
ctx.WithValue(key, value)    // Chainable variant

Reserved context keys (set by Bifrost internals — DO NOT set manually):

BifrostContextKeySelectedKeyID/Name — Set by governance plugin
BifrostContextKeyGovernance* — Set by governance plugin
BifrostContextKeyNumberOfRetries, BifrostContextKeyFallbackIndex — Set by retry/fallback logic
BifrostContextKeyStreamEndIndicator — Set by streaming infrastructure
BifrostContextKeyTrace*, BifrostContextKeySpan* — Set by tracing middleware

User-settable keys (plugins and handlers can set these):

BifrostContextKeyVirtualKey (x-bf-vk) — Virtual key for governance
BifrostContextKeyAPIKeyName (x-bf-api-key) — Explicit key selection by name
BifrostContextKeyAPIKeyID (x-bf-api-key-id) — Explicit key selection by ID (takes priority over name)
BifrostContextKeyRequestID — Request ID
BifrostContextKeyExtraHeaders — Extra headers to forward to provider
BifrostContextKeyURLPath — Custom URL path for provider
BifrostContextKeySkipKeySelection — Skip key selection (pass empty key)
BifrostContextKeyUseRawRequestBody — Send raw body directly to provider

Gotcha: BlockRestrictedWrites() silently drops writes to reserved keys. This prevents plugins from accidentally overwriting internal state.

Core Patterns

Provider Implementation

There are two categories of providers:

Category 1: Non-OpenAI-compatible (Anthropic, Bedrock, Gemini, Cohere, HuggingFace, Replicate, ElevenLabs):

core/providers/<name>/
├── <name>.go              # Controller: constructor, interface methods, HTTP orchestration
├── <name>_test.go         # Tests
├── types.go               # ALL provider-specific structs (PascalCase prefixed with provider name)
├── utils.go               # Constants, base URLs, helpers (camelCase for unexported)
├── errors.go              # Error parsing: provider HTTP error → *schemas.BifrostError
├── chat.go                # Chat request/response converters
├── embedding.go           # Embedding converters (if supported)
├── images.go              # Image generation (if supported)
├── speech.go              # TTS/STT (if supported)
└── responses.go           # Responses API + streaming converters

Category 2: OpenAI-compatible (Groq, Cerebras, Ollama, Perplexity, OpenRouter, Parasail, Nebius, xAI, SGL):

core/providers/<name>/
├── <name>.go              # Minimal — constructor + delegates to openai.HandleOpenAI* functions
└── <name>_test.go         # Tests

Converter function naming convention:

To<ProviderName><Feature>Request() — Bifrost schema → Provider API format
ToBifrost<Feature>Response() — Provider API format → Bifrost schema
These must be pure transformation functions — no HTTP calls, no logging, no side effects

Provider constructor pattern:

func NewProvider(config schemas.ProviderConfig) (*Provider, error) {
    // Validate config, set up fasthttp.Client with connection pooling
    client := &fasthttp.Client{
        MaxConnsPerHost:     config.NetworkConfig.MaxConnsPerHost, // configurable, default 5000
        MaxIdleConnDuration: 30 * time.Second,
    }
    // After ConfigureProxy/ConfigureDialer/ConfigureTLS, build a sibling client
    // for streaming. BuildStreamingClient zeros ReadTimeout/WriteTimeout/MaxConnDuration
    // so streams aren't killed by fasthttp's whole-response deadline; per-chunk idle
    // is enforced at the app layer via NewIdleTimeoutReader.
    streamingClient := providerUtils.BuildStreamingClient(client)
    return &Provider{client: client, streamingClient: streamingClient, ...}, nil
}

Streaming vs unary client: Every provider holds two clients — client for unary requests (ReadTimeout=30s bounds the whole response) and streamingClient for SSE / EventStream / chunked paths (ReadTimeout=0; the per-chunk NewIdleTimeoutReader is the only governor). Pass provider.streamingClient to every Handle*Streaming / Handle*StreamRequest helper and to direct Do calls inside *Stream methods. For new providers, apply the same pattern — missing the switch means streams get killed at 30s.

Note: Bedrock uses net/http (not fasthttp) with HTTP/2 support. Its http.Transport is configured with ForceAttemptHTTP2: true and MaxConnsPerHost from NetworkConfig to allow multiple HTTP/2 connections when the server's per-connection stream limit (100 for AWS Bedrock) is reached. Use providerUtils.BuildStreamingHTTPClient(client) to derive the streaming variant — it shares the base Transport (safe for concurrent reuse) but clears Client.Timeout.

The Provider Interface

core/schemas/provider.go defines the Provider interface with 30+ methods. Every provider must implement all of them (returning "not supported" for unsupported operations). The interface covers:

ListModels, ChatCompletion, ChatCompletionStream
Responses, ResponsesStream (OpenAI Responses API)
TextCompletion, TextCompletionStream
Embedding, Speech, SpeechStream, Transcription, TranscriptionStream
ImageGeneration, ImageGenerationStream, ImageEdit, ImageEditStream, ImageVariation
CountTokens
Batch* (Create, List, Retrieve, Cancel, Results)
File* (Upload, List, Retrieve, Delete, Content)
Container* and ContainerFile* (Create, List, Retrieve, Delete, Content)

Streaming methods receive a PostHookRunner callback and return chan *BifrostStreamChunk:

ChatCompletionStream(ctx *BifrostContext, postHookRunner PostHookRunner, key Key, request *BifrostChatRequest) (chan *BifrostStreamChunk, *BifrostError)

Error Handling

Each provider has errors.go with an ErrorConverter function:

type ErrorConverter func(resp *fasthttp.Response, requestType schemas.RequestType, providerName schemas.ModelProvider, model string) *schemas.BifrostError

The shared utility providerUtils.HandleProviderAPIError() handles common HTTP error parsing. Provider-specific parsers add extra field mapping. Errors always carry metadata:

bifrostErr.ExtraFields.Provider = providerName
bifrostErr.ExtraFields.ModelRequested = model
bifrostErr.ExtraFields.RequestType = requestType

Plugin System

Four plugin interfaces exist:

Interface	Hook Methods	When Called
`LLMPlugin`	`PreLLMHook`, `PostLLMHook`	Every LLM request (SDK + HTTP)
`MCPPlugin`	`PreMCPHook`, `PostMCPHook`	Every MCP tool execution
`HTTPTransportPlugin`	`HTTPTransportPreHook`, `HTTPTransportPostHook`, `HTTPTransportStreamChunkHook`	HTTP gateway only (not Go SDK)
`ObservabilityPlugin`	`Inject(ctx, trace)`	Async, after response written to wire

Key plugin behaviors:

Plugin errors are logged as warnings, never returned to the caller
Pre-hooks can short-circuit by returning *LLMPluginShortCircuit (cache hit, auth failure, rate limit)
Post-hooks receive both response and error — either can be nil. Plugins can recover from errors (set error to nil, provide response) or invalidate responses (set response to nil, provide error)
BifrostError.AllowFallbacks controls whether fallback providers are tried: nil or &true = allow, &false = block
HTTPTransportStreamChunkHook is called per-chunk during streaming — can modify, skip, or abort the stream

Pool System

core/pool/ provides Pool[T] with two build modes:

// Production (default): zero-overhead sync.Pool wrapper
// Debug (-tags pooldebug): tracks double-release, use-after-release, leaks with stack traces
p := pool.New[MyType]("descriptive-name", func() *MyType { return &MyType{} })
obj := p.Get()
// ... use obj ...
// MUST reset ALL fields before Put — pool does not auto-reset
p.Put(obj)

Acquire/Release pattern for types with complex reset logic (used in schemas/plugin.go):

req := schemas.AcquireHTTPRequest()    // Get from pool, pre-allocated maps
defer schemas.ReleaseHTTPRequest(req)  // Clears all maps and fields, returns to pool

HTTP Transport Layer

Handler pattern: Handlers are structs with injected dependencies:

type CompletionHandler struct {
    client       *bifrost.Bifrost
    handlerStore lib.HandlerStore
    config       *lib.Config
}

Route registration: Each handler implements RegisterRoutes(router, middlewares...) — routes get middleware chains applied per-route via lib.ChainMiddlewares().

SDK integration layers (transports/bifrost-http/integrations/) provide request/response converters between provider-native SDK formats and Bifrost's internal format. This enables drop-in replacement of OpenAI SDK, Anthropic SDK, AWS Bedrock SDK, Google GenAI SDK, LangChain, and LiteLLM.

Gotchas

1. Always Reset Pooled Objects Before Put

Every pooled object must have all fields zeroed before pool.Put(). Stale data leaks between requests. The debug build catches double-release and use-after-release but not missing resets.

// WRONG — stale data from previous request leaks to next user
pool.Put(msg)

// RIGHT
msg.Response = nil
msg.Error = nil
msg.Context = nil
msg.ResponseStream = nil
pool.Put(msg)

2. Channel Lifecycle — ProviderQueue Pattern

ProviderQueue uses atomic flags and sync.Once to prevent "send on closed channel" panics:

type ProviderQueue struct {
    queue      chan *ChannelMessage
    done       chan struct{}
    closing    uint32         // atomic: 0=open, 1=closing
    signalOnce sync.Once      // ensure signal fires only once
    closeOnce  sync.Once      // ensure close fires only once
}

Always check the atomic closing flag before sending. Never close a channel without this pattern.

3. NetworkConfig Duration Serialization

RetryBackoffInitial and RetryBackoffMax are time.Duration (nanoseconds) in Go but milliseconds (integers) in JSON. Custom MarshalJSON/UnmarshalJSON handles conversion. If adding new duration fields to any config struct, follow this pattern exactly.

4. ExtraHeaders — Defensive Map Copy

NetworkConfig.ExtraHeaders is deep-copied in CheckAndSetDefaults() to prevent data races between concurrent requests. Apply the same maps.Copy() pattern to any new map fields in config structs.

5. Provider Interface Has 30+ Methods

Adding a new operation type requires changes across the entire codebase:

Add method to Provider interface in core/schemas/provider.go
Implement in all 20+ providers (most return "not supported")
Add RequestType constant in core/schemas/bifrost.go
Add to AllowedRequests struct and IsOperationAllowed() switch
Add handler endpoint in transports/bifrost-http/handlers/
Wire up in core/bifrost.go and core/inference.go

6. OpenAI Provider Changes Cascade to 9+ Providers

Groq, Cerebras, Ollama, Perplexity, OpenRouter, Parasail, Nebius, xAI, and SGL all delegate to openai.HandleOpenAI* functions. Any change to OpenAI converter logic affects all of them. Always test broadly: make test-core (all providers).

7. Scanner Buffer Pool Has a Capacity Cap

The SSE scanner buffer pool in core/providers/utils/utils.go starts at 4KB. Buffers grow dynamically but those exceeding 64KB are discarded (not returned to pool) to prevent memory bloat. Be aware when working with providers that send very large SSE events.

8. Plugin Execution Order is Meaningful

Pre-hooks: registration order (first registered → first to run). Post-hooks: reverse order. This creates "wrapping" semantics — the first plugin registered is the outermost wrapper (its pre-hook runs first, post-hook runs last). Changing registration order changes behavior.

9. Fallbacks Re-execute the Full Plugin Pipeline

When a provider fails and the request falls to a fallback, the entire plugin pipeline re-executes from scratch. Governance checks, caching, and logging all run again for each attempt. Intentional, but surprising when debugging request counts or cost tracking.

10. `AllowedRequests` Nil Semantics

A nil *AllowedRequests means "all operations allowed." A non-nil value only allows fields explicitly set to true. This applies to both ProviderConfig.AllowedRequests and CustomProviderConfig.AllowedRequests.

11. BifrostContext Reserved Keys Are Silently Dropped

When BlockRestrictedWrites() is active, writes to reserved keys (governance IDs, retry counts, fallback index, etc.) are silently ignored — no error. If your plugin needs to pass data through context, use your own custom key type.

12. `fasthttp`, Not `net/http`

Bifrost uses github.com/valyala/fasthttp for provider HTTP calls. The API is different from net/http:

Use fasthttp.AcquireRequest()/fasthttp.ReleaseRequest() for lifecycle
fasthttp.Client pools connections per-host (NetworkConfig.MaxConnsPerHost, default 5000, 30s idle)
Request/response bodies accessed via resp.Body() (returns []byte, not io.Reader)
Exception: Bedrock uses net/http (for AWS SigV4 signing) with http.Transport configured for HTTP/2 multi-connection support

13. `sonic`, Not `encoding/json`

JSON marshaling in hot paths uses github.com/bytedance/sonic for performance. core/schemas/ uses standard encoding/json for custom marshaling (e.g., NetworkConfig). Don't mix them accidentally.

14. Atomic Pointer for Hot Config Reload

Bifrost uses atomic.Pointer for providers and plugins lists. On updates: create new slice → atomically swap pointer. Never mutate the slice in place — concurrent readers would see partial state.

15. MCP Tool Filtering is 4 Levels Deep

Tool access follows: Global filter → Client-level filter → Tool-level filter → Per-request filter (HTTP headers). All four levels must agree for a tool to be available. Changes to filtering logic must respect this hierarchy.

16. `config.schema.json` is the Source of Truth

transports/config.schema.json (~2700 lines) is the authoritative definition for all config.json fields. Documentation examples must match. When adding config fields: update schema first → handlers → docs.

17. UI `data-testid` Attributes Are Load-Bearing

E2E tests depend on data-testid attributes. Convention: data-testid="<entity>-<element>-<qualifier>". If you rename or remove one, search tests/e2e/ for references. If you add new interactive elements, add data-testid.

18. E2E Tests — Never Marshal Payloads to Maps

In tests/e2e/core/, never marshal API payloads to a Record/Map/plain-object and then re-serialize. Field ordering matters for backend validation and snapshot comparisons. Construct payloads as object literals with fields in the intended order and pass directly to Playwright's request.post({ data }). Avoid Object.fromEntries(), JSON.parse(JSON.stringify(...)) round-trips, or destructuring into an intermediate Record<string, unknown> — these can silently reorder fields.

Adding a New Provider — Full Checklist

Create core/providers/<name>/ with files per the pattern (see "Provider Implementation" above)
Add ModelProvider constant in core/schemas/bifrost.go
Add to StandardProviders list in core/schemas/bifrost.go
Register in core/bifrost.go — add import + case in provider init switch
UI integration (all required):
- ui/lib/constants/config.ts — model placeholder + key requirement
- ui/lib/constants/icons.tsx — provider icon
- ui/lib/constants/logs.ts — provider display name (2 places)
- docs/openapi/openapi.json — OpenAPI spec update
- transports/config.schema.json — config schema (2 locations)
CI/CD: Add env vars to .github/workflows/pr-tests.yml and release-pipeline.yml (4 jobs)
Docs: Create docs/providers/supported-providers/<name>.mdx
Test: make test-core PROVIDER=<name>

Testing

Always prefer `make test-core` over raw `go test` for provider-level tests

The make test-core target is the canonical harness for provider tests — it wires up env vars from .env (provider API keys), invokes the per-provider {provider}_test.go entrypoint in core/providers/<provider>/, and routes through the shared core/internal/llmtests/ scenario suite that validates end-to-end behavior (including streaming).

Running bare go test ./core/providers/<provider>/... only executes unit tests and skips the llmtests scenarios — so it won't catch regressions in streaming, tool-calling, or provider-specific response shapes.

make test-core PROVIDER=anthropic TESTCASE=TestChatCompletionStream   # exact test
make test-core PROVIDER=openai PATTERN=Stream                          # substring match
make test-core PROVIDER=bedrock                                        # all scenarios for one provider
make test-core DEBUG=1 PROVIDER=gemini TESTCASE=TestResponsesStream    # attach Delve on :2345

PATTERN and TESTCASE are mutually exclusive. Provider name must match a directory under core/providers/ (e.g. anthropic, openai, bedrock, vertex, azure, gemini, cohere, mistral, groq, etc.).

LLM Tests (`core/internal/llmtests/`)

Scenario-based tests that run against live provider APIs with dual-API testing (Chat Completions + Responses API):

func RunMyScenarioTest(t *testing.T, client *bifrost.Bifrost, ctx context.Context, cfg ComprehensiveTestConfig) {
    // Use validation presets: BasicChatExpectations(), ToolCallExpectations(), etc.
    // Use retry framework for flaky assertions
}

Register in tests.go testScenarios slice
Add Scenarios.MyScenario flag to ComprehensiveTestConfig
Run: make test-core PROVIDER=<name> TESTCASE=<TestName>

MCP Tests (`core/internal/mcptests/`)

Mock-based tests with DynamicLLMMocker and declarative setup:

manager, mocker, ctx := SetupAgentTest(t, AgentTestConfig{
    InProcessTools:   []string{"echo", "calculator"},
    AutoExecuteTools: []string{"*"},
    MaxDepth:         5,
})
// Queue mock LLM responses, assert tool execution order

Categories: agent_*_test.go, tool_*_test.go, connection_*_test.go, codemode_*_test.go

Run: make test-mcp TESTCASE=<TestName>

E2E Tests (`tests/e2e/`)

Playwright tests with page objects, data factories, fixtures:

Page objects extend BasePage, use getByTestId() as primary selector strategy
Data factories use Date.now() for unique names (prevents collision in parallel runs)
Track created resources in arrays, clean up in afterEach
Import test/expect from ../../core/fixtures/base.fixture (never from @playwright/test)
Never marshal API payloads to a Record/Map/plain-object and then re-serialize. Field ordering matters for snapshot comparisons and some backend validations. Construct payloads as object literals with fields in the intended order and pass directly to Playwright's request.post({ data }). Do NOT destructure into an intermediate Record<string, unknown> or use Object.fromEntries() / JSON.parse(JSON.stringify(...)) round-trips, as these can reorder fields.

Run: make run-e2e FLOW=<feature>

Claude Code Skills

Four skills are available via /skill-name:

`/docs-writer <feature-name>`

Write, update, or review Mintlify MDX documentation. Researches UI code, Go handlers, and config schema. Validates config.json examples against transports/config.schema.json. Outputs docs with Web UI / API / config.json tabs.

Variants: /docs-writer update <doc-path>, /docs-writer review <doc-path>

`/e2e-test <feature-name>`

Create, run, debug, audit, or auto-update Playwright E2E tests.

Variants:

/e2e-test fix <spec> — Debug and fix a failing test
/e2e-test sync — Detect UI changes, update affected tests automatically
/e2e-test audit — Scan specs for incorrect/weak assertions (P0-P6 severity scale)

`/investigate-issue <issue-id>`

Investigate a GitHub issue from maximhq/bifrost. Fetches issue details, classifies by type/area, searches codebase, traces dependencies, analyzes side effects, suggests tests (LLM/MCP/E2E), and presents an implementation plan with per-change approval gates.

`/resolve-pr-comments <pr-number>`

Systematically address unresolved PR review comments. Uses GraphQL to get unresolved threads, presents each with FIX/REPLY/SKIP options, collects fixes locally, and only posts replies after code is pushed to remote.

Common Workflows

Modify chat completions across all providers

Change types in core/schemas/chatcompletions.go
Update converter functions in each provider's chat.go
If streaming affected, update framework/streaming/ (accumulator, delta copy)
Run make test-core (all providers)

Add a new field to API responses

Add to schema type in core/schemas/
Map in provider response converter (ToBifrost*Response)
Handle in streaming accumulator if applicable
Update HTTP handler if field needs special serialization
Update transports/config.schema.json if configurable

Add a new plugin

Create plugins/<name>/ with its own go.mod
Implement LLMPlugin, MCPPlugin, or HTTPTransportPlugin interface
Add to go.work
Register in transport layer or Bifrost config
Add test targets to Makefile

Modify a UI feature

Find workspace page: ui/app/workspace/<feature>/
Check existing data-testid attributes — E2E tests depend on them
Add data-testid to new interactive elements
Run make run-e2e FLOW=<feature> to verify
If E2E tests break, use /e2e-test sync to update them

Key Files Quick Reference

What	Where
Main Bifrost struct & queuing	`core/bifrost.go`
Inference routing & fallbacks	`core/inference.go`
Provider interface (30+ methods)	`core/schemas/provider.go`
ModelProvider enum & context keys	`core/schemas/bifrost.go`
Plugin interfaces & pooled HTTP types	`core/schemas/plugin.go`
BifrostContext (mutable context)	`core/schemas/context.go`
Chat completion types	`core/schemas/chatcompletions.go`
Responses API types	`core/schemas/responses.go`
Object pool (prod + debug)	`core/pool/pool_prod.go`, `pool_debug.go`
Shared provider utils & SSE parsing	`core/providers/utils/utils.go`
Streaming accumulator	`framework/streaming/accumulator.go`
HTTP inference handler	`transports/bifrost-http/handlers/inference.go`
Governance handler	`transports/bifrost-http/handlers/governance.go`
Config schema (source of truth)	`transports/config.schema.json`
Pool debug profiler	`transports/bifrost-http/handlers/devpprof.go`
LLM test infrastructure	`core/internal/llmtests/`
MCP test infrastructure	`core/internal/mcptests/`
E2E test infrastructure	`tests/e2e/core/`
Docs navigation config	`docs/docs.json`
CI/CD workflows	`.github/workflows/`

Code Style

Go: gofmt/goimports. No custom linter config.
TypeScript/React: Oxfmt. TanStack Router.
JSON tags: snake_case matching provider API conventions.
Error strings: Lowercase, no trailing punctuation (Go convention).
Provider types: Prefixed with provider name in PascalCase (AnthropicChatRequest, GeminiEmbeddingResponse).
Converter functions: Pure — no side effects, no logging, no HTTP.
Pool names: Descriptive string passed to pool.New() (e.g., "channel-message", "response-stream").
Context keys: Use BifrostContextKey type. Custom plugins should define their own key types to avoid collisions.
Go filenames: No underscores. The only permitted underscore is the _test.go suffix. Examples: pluginpipeline.go, pluginpipeline_test.go — never plugin_pipeline.go or plugin_pipeline_race_test.go. Concatenate words (lowercase, no separators) for multi-word filenames.

Frontend Code Guidelines & Patterns

This document defines the standards, structure, and best practices for writing frontend code in this project.

Tech Stack

React (with Vite)
TypeScript
@tanstack/react-router (type-safe routing)
Tailwind CSS v4
Radix UI (primitives)
Local UI component library (ui/components/ui/) built on Radix primitives

Folder Structure


/ui
├── app                # Routes & pages
├── components        # Shared components
│   └── ui            # Core design system components
├── hooks             # Custom React hooks
├── lib               # Utilities, helpers, shared logic
└── app/enterprise    # Enterprise-specific code (via symlink)

Rules

All frontend code must live inside /ui
Routes and pages → ui/app
Shared/reusable components → ui/components
Core UI primitives → ui/components/ui
Utilities and libraries → ui/lib
Custom hooks → ui/hooks

Libraries & Usage

Core Libraries

react → UI library
typescript → Type safety
tailwindcss → Styling
@tanstack/react-router → Routing

UI & Visualization

@radix-ui/react-* → UI primitives
ui/components/ui/* → Project's Radix-based component system
recharts → Charts
monaco-editor → Code editor

Utilities

date-fns → Date/time formatting
nuqs → Query param state management

Tooling

Oxfmt → Code formatting
vitest → Testing

Routing Convention

For every new route:


ui/app/<route-name>/
├── layout.tsx   # Route definition using createFileRoute
├── page.tsx     # Page content
└── views/       # Optional: route-specific components

Rules

Folder name must match route name
Always use createFileRoute in layout.tsx
page.tsx should only handle composition (not heavy logic)
Route-specific components go inside views/

Component Guidelines

Reusability First

Always check if similar components/functions already exist
Prefer extending or refactoring existing code over duplication
Only create new components if reuse is not feasible

Component Placement

Shared → ui/components
Route-specific → views/ inside route folder

JSX & Rendering

Avoid deeply nested conditional rendering
Break complex UI into smaller components
Keep components readable and maintainable

Lists & Keys

Always use stable, unique keys
Never use array index as key (unless unavoidable)

React Best Practices

Avoid unnecessary or unstable dependencies in hooks
Prevent infinite loops in useEffect
Keep dependency arrays accurate and minimal
Prefer derived state over duplicated state

State Management

Priority Order

Query Params (nuqs) → for persistent/shareable state
Local State → for UI-only state
Redux → only when truly necessary

Query Params (`nuqs`)

Use for state that should persist across refresh/navigation
Use proper parsers like parseAsString or parseAsInteger
Do NOT mix query param state with local/redux state
Follow a single consistent pattern across the codebase

Redux

Use only when global/shared state is required
Avoid unnecessary slices
Prefer simpler alternatives when possible

RTK Query (`@reduxjs/toolkit/query`)

Use for API calls and caching
Use granular tags for cache invalidation
Avoid invalidating entire datasets unnecessarily
Implement optimistic updates where applicable

Forms

We use:

react-hook-form
zod v4 (for schema validation)

Rules

Always define a Zod schema
Include meaningful validation messages
Prefer inline field errors (not toast notifications)
Use refine / superRefine for complex validation
Store schemas in: ui/lib/types/schemas.ts

Tables

Use @tanstack/react-table only for large/complex datasets
For simple tables → build custom lightweight components
Prioritize performance over abstraction

⚡ Performance Guidelines

Lazy load heavy or rarely-used libraries
Avoid unnecessary re-renders
Split large components into smaller ones
Keep bundle size minimal

Dependency Rules

Do NOT add new dependencies unless absolutely necessary
Always pin exact versions (no ^ or ~)
Prefer existing libraries in the codebase

TypeScript Guidelines

Avoid using any unless absolutely unavoidable
Prefer strict typing and inference
Define reusable types in shared locations

Code Quality & Formatting

After writing code:

cd ui && npm run format

Then verify build:

cd ui && npm run build

Code must pass formatting and build checks
Follow consistent naming and structure conventions

Anti-Patterns to Avoid

Duplicate components without considering reuse
Mixing multiple state management approaches unnecessarily
Overusing Redux
Using unstable hook dependencies
Adding heavy libraries for simple use cases
Poorly structured or deeply nested JSX

Summary

Prioritize reusability, performance, and consistency
Follow strict folder structure and routing conventions
Use the right tool for the right problem
Keep code simple, predictable, and maintainable

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md — Bifrost AI Gateway

What is Bifrost?

Repository Layout

Go Workspace

Build, Test & Dev Commands

Architecture

Request Flow

Design Principles

BifrostContext — Custom Context

Core Patterns

Provider Implementation

The Provider Interface

Error Handling

Plugin System

Pool System

HTTP Transport Layer

Gotchas

1. Always Reset Pooled Objects Before Put

2. Channel Lifecycle — ProviderQueue Pattern

3. NetworkConfig Duration Serialization

4. ExtraHeaders — Defensive Map Copy

5. Provider Interface Has 30+ Methods

6. OpenAI Provider Changes Cascade to 9+ Providers

7. Scanner Buffer Pool Has a Capacity Cap

8. Plugin Execution Order is Meaningful

9. Fallbacks Re-execute the Full Plugin Pipeline

10. AllowedRequests Nil Semantics

11. BifrostContext Reserved Keys Are Silently Dropped

12. fasthttp, Not net/http

13. sonic, Not encoding/json

14. Atomic Pointer for Hot Config Reload

15. MCP Tool Filtering is 4 Levels Deep

16. config.schema.json is the Source of Truth

17. UI data-testid Attributes Are Load-Bearing

18. E2E Tests — Never Marshal Payloads to Maps

Adding a New Provider — Full Checklist

Testing

Always prefer make test-core over raw go test for provider-level tests

LLM Tests (core/internal/llmtests/)

MCP Tests (core/internal/mcptests/)

E2E Tests (tests/e2e/)

Claude Code Skills

/docs-writer <feature-name>

/e2e-test <feature-name>

/investigate-issue <issue-id>

/resolve-pr-comments <pr-number>

Common Workflows

Modify chat completions across all providers

Add a new field to API responses

Add a new plugin

Modify a UI feature

Key Files Quick Reference

Code Style

Frontend Code Guidelines & Patterns

Tech Stack

Folder Structure

Rules

Libraries & Usage

Core Libraries

UI & Visualization

Utilities

Tooling

Routing Convention

Rules

Component Guidelines

Reusability First

Component Placement

JSX & Rendering

Lists & Keys

React Best Practices

State Management

Priority Order

Query Params (nuqs)

Redux

10. `AllowedRequests` Nil Semantics

12. `fasthttp`, Not `net/http`

13. `sonic`, Not `encoding/json`

16. `config.schema.json` is the Source of Truth

17. UI `data-testid` Attributes Are Load-Bearing

Always prefer `make test-core` over raw `go test` for provider-level tests

LLM Tests (`core/internal/llmtests/`)

MCP Tests (`core/internal/mcptests/`)

E2E Tests (`tests/e2e/`)

`/docs-writer <feature-name>`

`/e2e-test <feature-name>`

`/investigate-issue <issue-id>`

`/resolve-pr-comments <pr-number>`

Query Params (`nuqs`)

RTK Query (`@reduxjs/toolkit/query`)