The ToolHive operator manages MCP servers in Kubernetes clusters using custom resources and the operator pattern. This document explains the operator's design, components, and reconciliation logic.
Why two binaries?
thv-operator: Watches CRDs, reconciles Kubernetes resourcesthv-proxyrunner: Runs in pods, creates containers, proxies traffic
This separation provides clear responsibility boundaries and enables independent scaling.
Implementation: cmd/thv-operator/, cmd/thv-proxyrunner/
graph TB
User[User] -->|kubectl apply| API[Kubernetes API]
API -->|watch| Operator[thv-operator]
Operator -->|create| Deploy[Deployment<br/>thv-proxyrunner]
Operator -->|create| SVC[Service]
Operator -->|create| CM[ConfigMap<br/>RunConfig]
Deploy -->|mount| CM
Deploy -->|create| STS[StatefulSet<br/>MCP Server]
Deploy -->|proxy to| STS
Client[MCP Client] --> SVC
SVC --> Deploy
style Operator fill:#5c6bc0
style Deploy fill:#90caf9
style STS fill:#ffb74d
Defines an MCP server deployment, including container images, transports, middleware, and authentication configuration.
Implementation: cmd/thv-operator/api/v1alpha1/mcpserver_types.go
MCPServer resources support various transport types (stdio, SSE, streamable-http), permission profiles, OIDC authentication, and Cedar-based authorization policies. The operator reconciles these resources into Kubernetes Deployments, Services, and StatefulSets.
Status fields include phase (Running, Pending, Failed, Terminating) and the accessible URL for the MCP server.
For examples, see:
examples/operator/mcp-servers/mcpserver_github.yaml- Basic GitHub MCP serverexamples/operator/mcp-servers/mcpserver_with_configmap_oidc.yaml- With OIDC authenticationexamples/operator/mcp-servers/mcpserver_with_pod_template.yaml- With pod customizations
Manages MCP server registries in Kubernetes, supporting both Git-based and ConfigMap-based registry sources with automatic or manual synchronization.
Implementation: cmd/thv-operator/api/v1alpha1/mcpregistry_types.go
MCPRegistry resources can sync registry data from external sources and optionally deploy a registry API service for serving the registry data to other components.
Controller: cmd/thv-operator/controllers/mcpregistry_controller.go
For examples, see the examples/operator/ directory.
Defines tool filtering and override configuration.
Implementation: cmd/thv-operator/api/v1alpha1/toolconfig_types.go
MCPToolConfig allows you to filter which tools are exposed by an MCP server and customize tool metadata. See examples/operator/mcp-servers/mcpserver_fetch_tools_filter.yaml for a complete example.
Referenced by MCPServer using toolConfigRef.
Controller: cmd/thv-operator/controllers/toolconfig_controller.go
Manages external authentication configurations that can be shared across multiple MCPServer resources.
Implementation: cmd/thv-operator/api/v1alpha1/mcpexternalauthconfig_types.go
MCPExternalAuthConfig allows you to define reusable OIDC authentication configurations that can be referenced by multiple MCPServer resources. This is useful for sharing authentication settings across servers.
Referenced by MCPServer using oidcConfig.type: external.
Controller: cmd/thv-operator/controllers/mcpexternalauthconfig_controller.go
Defines a proxy for remote MCP servers with authentication, authorization, audit logging, and tool filtering.
Key fields:
remoteURL- URL of the remote MCP server to proxyoidcConfig- OIDC authentication for incoming requests (required)externalAuthConfigRef- Token exchange for remote service authenticationauthzConfig- Authorization policiestoolConfigRef- Tool filtering and renaming
Implementation: cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.go
Controller: cmd/thv-operator/controllers/mcpremoteproxy_controller.go
Logically groups MCPServer resources together for organizational purposes.
Implementation: cmd/thv-operator/api/v1alpha1/mcpgroup_types.go
MCPGroup resources allow grouping related MCP servers. Servers reference their group using the groupRef field in MCPServer spec. The group tracks member servers in its status.
Status fields include phase (Ready, Pending, Failed), list of server names, and server count.
Referenced by MCPServer using spec.groupRef.
Controller: cmd/thv-operator/controllers/mcpgroup_controller.go
Aggregates multiple MCPServer resources from an MCPGroup into a single unified MCP server interface with advanced composition capabilities.
Implementation: cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go
VirtualMCPServer creates a virtual MCP server that aggregates tools, resources, and prompts from multiple backend MCPServers. It provides:
Key capabilities:
- Backend Discovery: Automatically discovers MCPServers from a referenced MCPGroup
- Tool Aggregation: Aggregates tools from multiple backends with configurable conflict resolution (prefix, priority, manual)
- Tool Filtering: Selective tool exposure with allow/deny lists and rewriting rules
- Composite Tools: Create new tools that orchestrate calls across multiple backend tools
- Incoming Authentication: OIDC and authorization policies for clients connecting to the virtual server
- Outgoing Authentication: Automatic token exchange and authentication to backend servers
- Token Caching: Configurable token caching with TTL and capacity limits
- Operational Controls: Health check intervals, failure handling, and backend retry logic
Architecture:
┌─────────────┐
│ Clients │
└──────┬──────┘
│
│ (OIDC auth)
▼
┌────────────────────────┐
│ VirtualMCPServer │
│ - Tool Aggregation │
│ - Conflict Resolution │
│ - Composite Tools │
│ - Token Exchange │
└────────┬───────────────┘
│
├──────────┬──────────┬──────────┐
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Backend1│ │Backend2│ │Backend3│ │Backend4│
│MCPSrvr │ │MCPSrvr │ │MCPSrvr │ │MCPSrvr │
└────────┘ └────────┘ └────────┘ └────────┘
(Discovered from MCPGroup)
Status fields include:
- Phase (Ready, Degraded, Pending, Failed)
- URL for accessing the virtual server
- Discovered backends with individual health status
- Capabilities summary (tool/resource/prompt counts)
- Detailed conditions for validation, discovery, and readiness
Referenced by: MCPGroup (via spec.groupRef)
Controller: cmd/thv-operator/controllers/virtualmcpserver_controller.go
Key features:
-
Conflict Resolution Strategies:
prefix: Prefix tool names with backend identifierpriority: First backend in priority order wins conflictsmanual: Explicitly define which backend wins each conflict
-
Composite Tools: Define new tools that orchestrate multiple backend tool calls with parameter mapping and response aggregation
-
Watch Optimization: Targeted reconciliation - only reconciles VirtualMCPServers affected by backend changes, not all servers in the namespace
-
Status Reconciliation: Robust status updates with conflict handling following Kubernetes optimistic concurrency control patterns
-
Backend Health Monitoring: Periodic health checks with configurable intervals and automatic status updates
For examples, see the examples/operator/ directory.
For complete examples of all CRDs, see the examples/operator/mcp-servers/ directory.
Reconciliation loop:
- Watch MCPServer resources
- Get desired state from CRD spec
- Get current state from cluster
- Compare desired vs current
- Reconcile - Create, update, or delete resources
- Update status with result
Implementation: cmd/thv-operator/controllers/mcpserver_controller.go
For each MCPServer, operator creates:
-
Deployment (proxy-runner)
- Runs
thv-proxyrunnerimage - Mounts RunConfig as ConfigMap
- Applies middleware configuration
- Runs
-
StatefulSet (MCP server)
- Created by proxy-runner
- Runs actual MCP server image
- Stable network identity
-
Service
- Exposes proxy deployment
- Type: ClusterIP, LoadBalancer, or NodePort
- Routes traffic to proxy
-
ConfigMap (RunConfig)
- Contains serialized RunConfig
- Mounted into proxy-runner pod
-
ServiceAccount (optional)
- For RBAC permissions
- Pod identity
graph LR
subgraph "Namespace: default"
Deploy["Deployment (proxy)<br/>Replicas: 1<br/>thv-proxyrunner"]
SVC["Service<br/>Type: ClusterIP"]
STS["StatefulSet (mcp)<br/>Replicas: 1<br/>MCP Server"]
CM["ConfigMap<br/>RunConfig"]
end
Deploy -->|manages| STS
Deploy -->|mounts| CM
SVC -->|routes to| Deploy
style Deploy fill:#90caf9
style STS fill:#ffb74d
style SVC fill:#81c784
style CM fill:#e3f2fd
Purpose: Runs inside Deployment pod, creates and proxies to MCP server
Responsibilities:
- Read RunConfig from mounted ConfigMap
- Create StatefulSet with MCP server
- Wait for StatefulSet to be ready
- Start transport and proxy
- Apply middleware chain
- Forward traffic to StatefulSet pods
Command:
thv-proxyrunner runEnvironment:
KUBERNETES_SERVICE_HOST- Detects K8s environment- RunConfig path from mount
- In-cluster Kubernetes client
Implementation: cmd/thv-proxyrunner/app/commands.go
From: cmd/thv-operator/DESIGN.md
Use CRD attributes for:
- Business logic affecting reconciliation
- Validation requirements
- Cross-resource coordination
- Operator decision making
Use PodTemplateSpec for:
- Infrastructure concerns (node selection, resources, affinity)
- Sidecar containers
- Standard Kubernetes pod configuration
- Cluster admin configurations
Examples:
CRD attribute:
spec:
transport: sse # Affects operator logic
port: 8080 # Affects Service creationPodTemplateSpec:
spec:
podTemplateSpec:
spec:
nodeSelector:
disktype: ssd # Infrastructure concernPattern: Batched updates via StatusCollector
Why: Prevents race conditions, reduces API calls
Implementation: cmd/thv-operator/pkg/mcpregistrystatus/
Architecture:
graph TB
MCPReg[MCPRegistry CRD] --> Controller[Controller]
Controller --> Source[Source Handler]
Source -->|git| Git[Git Clone]
Source -->|configmap| CM[Read ConfigMap]
Git --> Storage[Storage Manager]
CM --> Storage
Storage --> ConfigMap[ConfigMap Storage]
Controller --> API[Registry API Service]
API --> Deploy[Deployment]
API --> SVC[Service]
style Controller fill:#5c6bc0
style Storage fill:#e3f2fd
style API fill:#ba68c8
Git source: cmd/thv-operator/pkg/sources/git.go
- Clones repository
- Reads registry.json
- Calculates hash for change detection
ConfigMap source: cmd/thv-operator/pkg/sources/configmap.go
- Reads from existing ConfigMap
- Watches for updates
Storage Manager: cmd/thv-operator/pkg/sources/storage_manager.go
- Creates ConfigMap with key
registry.jsoncontaining full registry data - Sync metadata (timestamp, hash, attempt count) stored in MCPRegistry CRD status field
SyncStatus
Interface: cmd/thv-operator/pkg/sources/types.go
Purpose: Persist registry data in cluster
Implementation: cmd/thv-operator/pkg/sources/storage_manager.go
Storage: ConfigMap with owner reference
Format:
data:
registry.json: |
{ full registry data }Sync metadata (timestamp, hash, attempt count) is stored in the MCPRegistry CRD status field SyncStatus, not in the ConfigMap.
Automatic sync:
spec:
syncPolicy:
interval: 1hOperator syncs every hour. The presence of syncPolicy with an interval enables automatic synchronization.
Manual sync:
Omit the syncPolicy field entirely.
Trigger: Add or update annotation toolhive.stacklok.dev/sync-trigger=<unique-value> where the value can be any non-empty string. The operator triggers sync when this value changes, allowing multiple manual syncs by using different values (e.g., timestamps, counters).
When enabled, operator creates:
- Deployment running
thv-registry-api - Service exposing API
- ConfigMap mount with registry data
Implementation: cmd/thv-operator/pkg/registryapi/service.go
OIDC config:
Both patterns are supported:
# Pattern 1: Direct data fields (as shown in examples/operator/mcp-servers/mcpserver_with_configmap_oidc.yaml)
# ConfigMap contains: issuer, audience, clientId as direct fields
spec:
oidcConfig:
type: configMap
configMap:
name: oidc-config
# Pattern 2: JSON file with key parameter
# ConfigMap contains a JSON file at the specified key
spec:
oidcConfig:
type: configMap
configMap:
name: oidc-config
key: oidc.json # defaults to oidc.json if omittedAuthz policies:
spec:
authzConfig:
type: configMap
configMap:
name: authz-policies
key: authz.json # defaults to authz.json if omittedOIDC:
spec:
oidcConfig:
type: inline
inline:
issuer: https://auth.example.com
audience: my-appAuthz:
spec:
authzConfig:
type: inline
inline:
policies:
- permit(principal, action, resource);- Deployment Modes - Kubernetes mode details
- Core Concepts - Operator concepts
- Registry System - MCPRegistry CRD
- Operator Design:
cmd/thv-operator/DESIGN.md