Deployment Modes

ToolHive supports three distinct deployment modes, each optimized for different use cases and environments. This document provides a detailed explanation of how ToolHive operates in each mode.

Overview

graph LR
    subgraph LocalDeployment[Local Deployment]
        CLI[CLI Mode<br/>thv binary]
        UI[UI Mode<br/>ToolHive Studio]
    end

    subgraph KubernetesDeployment[Kubernetes Deployment]
        Operator[Operator Mode<br/>thv-operator]
    end

    CLI --> Docker[Docker/Podman<br/>Colima<br/>Rancher Desktop]
    UI --> Docker
    Operator --> K8s[Kubernetes]

    Docker --> MCP1[MCP Servers]
    K8s --> MCP2[MCP Servers]

    style LocalDeployment fill:#e1f5fe
    style KubernetesDeployment fill:#fff3e0

Mode Comparison

Feature	Local CLI	Local UI	Kubernetes
Binary	`thv`	`thv` (API server)	`thv-operator` + `thv-proxyrunner`
Container Runtime	Docker/Podman/Colima/Rancher	Docker/Podman/Colima/Rancher	Kubernetes
Process Management	Detached processes	API-managed	Operator-managed
State Storage	Local filesystem	Local filesystem	etcd (K8s API)
Scaling	Single machine	Single machine	Cluster-wide
Best For	Developers, CLI users	UI users, beginners	Production, multi-tenant

Local Mode: CLI

Architecture

graph TB
    User[User] -->|CLI Commands| THV[thv binary]
    THV -->|spawn detached| Proxy[Proxy Process]
    Proxy -->|Docker API| Runtime[Container Runtime<br/>Docker/Podman/Colima]
    Runtime -->|creates| Container[MCP Server Container]
    Proxy -->|stdin/stdout or HTTP| Container
    Client[MCP Client] -->|HTTP/SSE/Streamable| Proxy

    style THV fill:#90caf9
    style Proxy fill:#81c784
    style Container fill:#ffb74d

How It Works

User executes command: thv run server-name
ToolHive CLI (cmd/thv/main.go):
- Parses command-line arguments
- Loads or creates RunConfig
- Instantiates workloads API (pkg/workloads/manager.go)
Workload Manager:
- Detects available container runtime (Podman → Colima → Docker)
- Creates container via Runtime API
- Spawns detached proxy process
Proxy Process:
- Runs as independent process (via thv restart --foreground)
- Attaches to container (for stdio) or forwards HTTP traffic
- Applies middleware chain
- Exposes local HTTP endpoint for MCP clients
State Management:
- RunConfig saved to ~/.toolhive/state/ (or XDG equivalent)
- PID file for process management
- Status file for workload state tracking

Container Runtime Selection

Implementation: pkg/container/factory.go

The CLI automatically detects container runtimes in this order:

Podman - Checks for Podman socket at:
- $TOOLHIVE_PODMAN_SOCKET (if set)
- /var/run/podman/podman.sock
- $XDG_RUNTIME_DIR/podman/podman.sock
- ~/.local/share/containers/podman/machine/podman.sock (Podman Machine on macOS)
- $TMPDIR/podman/*-api.sock (Podman Machine API on macOS)
Colima - Checks for Colima socket at:
- $TOOLHIVE_COLIMA_SOCKET (if set)
- ~/.colima/default/docker.sock
Docker (including Docker Desktop, Rancher Desktop, and OrbStack) - Checks for Docker socket at:
- $TOOLHIVE_DOCKER_SOCKET (if set)
- /var/run/docker.sock
- ~/.docker/run/docker.sock (Docker Desktop on macOS)
- ~/.rd/docker.sock (Rancher Desktop on macOS)
- ~/.orbstack/run/docker.sock (OrbStack on macOS)

Detached Process Model

When running in detached mode (thv run without --foreground):

sequenceDiagram
    participant User
    participant THV as thv (parent)
    participant THV2 as thv restart<br/>(detached child)
    participant Container

    User->>THV: thv run server-name
    THV->>THV: Save RunConfig to state
    THV->>THV2: Fork: thv restart --foreground
    Note over THV2: Detached process<br/>with new session
    THV->>User: Return (PID written)
    THV2->>Container: Attach or proxy
    Container->>THV2: MCP traffic
    THV2->>THV2: Apply middleware
    Note over THV2: Runs indefinitely

Key Implementation:

pkg/workloads/manager.go - RunWorkloadDetached method
Uses exec.Command with SysProcAttr to detach
Sets TOOLHIVE_DETACHED=true environment variable
Redirects stdout/stderr to log file: ~/.toolhive/logs/<workload>.log

File Locations

Purpose	Path (Linux)	Path (macOS)
State files (RunConfig)	`~/.local/state/toolhive/`	`~/Library/Application Support/toolhive/`
Data files (logs, PIDs, secrets, statuses)	`~/.local/share/toolhive/`	`~/Library/Application Support/toolhive/`
Config files	`~/.config/toolhive/`	`~/Library/Application Support/toolhive/`
Cache files	`~/.cache/toolhive/`	`~/Library/Caches/toolhive/`

Implementation: Uses adrg/xdg package for XDG Base Directory compliance.

Local Mode: UI

Architecture

graph TB
    User[User] -->|Web Browser| Studio[ToolHive Studio<br/>Web UI]
    Studio -->|REST API| APIServer[thv serve<br/>API Server]
    APIServer -->|Internal| Workloads[Workloads Manager]
    Workloads -->|Runtime API| Runtime[Container Runtime<br/>Docker/Podman/Rancher]
    Runtime -->|creates| Container[MCP Server Container]
    Container -->|managed by| Proxy[Proxy Process]
    Client[MCP Client] -->|HTTP| Proxy

    style Studio fill:#ba68c8
    style APIServer fill:#90caf9
    style Proxy fill:#81c784
    style Container fill:#ffb74d

How It Works

User starts UI: ToolHive Studio application launches
Studio spawns API server: thv serve
- Starts HTTP API server on configurable port (default: 8080)
- Exposes RESTful endpoints for workload management
API Server (pkg/api/server.go):
- Handles HTTP requests from UI
- Delegates to Workloads Manager
- Returns JSON responses
Workload Operations:
- Create: POST /api/v1beta/workloads
- List: GET /api/v1beta/workloads
- Stop: POST /api/v1beta/workloads/{name}/stop
- Delete: DELETE /api/v1beta/workloads/{name}
- Logs: GET /api/v1beta/workloads/{name}/logs
Runtime Selection:
- Picks runtime driver based on environment
- Docker, Podman, or Rancher Desktop
- Uses driver API to spawn containers

API Endpoints

Full API documentation available at:

OpenAPI spec: pkg/api/openapi.go
Interactive docs: http://localhost:8080/api/doc (Scalar UI)

Key endpoints:

/api/v1beta/workloads - Workload management
/api/v1beta/registry - Registry browsing
/api/v1beta/clients - Client configuration
/api/v1beta/groups - Group management

Differences from CLI Mode

Aspect	CLI Mode	UI Mode
Process Model	Detached child process	Managed by API server
State Access	Direct filesystem	Via API server
Authentication	None (local user)	Optional (configurable)
Middleware Config	CLI flags or config file	API requests
Runtime Selection	Automatic detection	User selectable in UI

Kubernetes Mode: Operator

Architecture

graph TB
    User[User] -->|kubectl apply| K8s[Kubernetes API]
    K8s -->|watch| Operator[thv-operator]
    Operator -->|create| Deploy[Deployment<br/>thv-proxyrunner]
    Operator -->|create| SVC[Service]
    Deploy -->|create| STS[StatefulSet<br/>MCP Server]
    Deploy -->|proxy to| STS
    Client[MCP Client] -->|HTTP| SVC
    SVC -->|route to| Deploy

    style Operator fill:#5c6bc0
    style Deploy fill:#90caf9
    style STS fill:#ffb74d

How It Works

User applies CRD: kubectl apply -f mcpserver.yaml
Operator watches resources (cmd/thv-operator/controllers/mcpserver_controller.go):
- Watches for MCPServer custom resources
- Reconciles desired state vs actual state
Operator creates Deployment:
- Runs thv-proxyrunner container
- Mounts RunConfig as ConfigMap or secret
- Applies middleware configuration
Proxy runner creates StatefulSet:
- Uses Kubernetes API (in-cluster client)
- Creates StatefulSet with MCP server container
- Manages container lifecycle
Proxy runner proxies traffic:
- Receives requests on exposed port
- Applies middleware chain
- Forwards to StatefulSet pod(s)
Operator creates Service:
- Exposes proxy runner Deployment
- LoadBalancer, ClusterIP, or NodePort
- Routes external traffic to proxy

Why Two Binaries?

thv-operator (cmd/thv-operator/):

Watches Kubernetes API for CRDs
Reconciles desired vs actual state
Creates Kubernetes resources (Deployments, Services, ConfigMaps)
Does NOT run the proxy or create containers directly

thv-proxyrunner (cmd/thv-proxyrunner/):

Runs as a container in the Deployment
Creates containers via Kubernetes API (StatefulSets)
Applies middleware and proxies MCP traffic
Handles transport-specific communication

Why not use thv in Kubernetes?

thv is optimized for local Docker/Podman API usage
Kubernetes requires different container creation logic (StatefulSets vs standalone containers)
Separation of concerns: operator manages K8s resources, proxy-runner manages MCP traffic

Deployment Pattern

graph LR
    subgraph "Namespace: default"
        Deploy[Deployment<br/>proxy-runner<br/>Replicas: 1]
        SVC[Service<br/>proxy-svc]
        STS[StatefulSet<br/>mcp-server<br/>Replicas: 1]
    end

    Deploy -->|manages| STS
    SVC -->|routes to| Deploy
    Deploy -.->|watches| STS

    style Deploy fill:#90caf9
    style STS fill:#ffb74d
    style SVC fill:#81c784

Custom Resource Definitions

ToolHive provides several CRDs for managing MCP servers in Kubernetes:

MCPServer - Defines an MCP server deployment with container images, transports, and middleware
MCPRegistry - Manages MCP server registries from Git or ConfigMap sources

For complete examples, see the examples/operator/mcp-servers/ directory, which includes:

Basic MCP server deployments with different transports (stdio, SSE, streamable-http)
Authentication configurations (inline OIDC, ConfigMap-based, Kubernetes-native)
Resource and pod template customizations
Tool filtering and middleware examples

Full CRD API documentation is available in docs/operator/crd-api.md.

Operator Design Decisions

See cmd/thv-operator/DESIGN.md for detailed decision documentation.

Key principles:

Use CRD attributes for business logic affecting reconciliation
Use PodTemplateSpec for infrastructure concerns (node selection, resources)
Separate sync decision logic from sync execution
Batch status updates to reduce API server load

State Management

Unlike local mode, Kubernetes mode stores state in:

etcd (via Kubernetes API)
ConfigMaps for RunConfig
Secrets for sensitive data (OIDC client secrets, etc.)
Status subresources for workload state

No local filesystem state required.

Scaling Considerations

Proxy runner:

Typically runs with 1 replica
Multiple replicas may be possible with session affinity (not currently tested)
Note: stdio transport requires single proxy instance due to exclusive stdin/stdout attachment

MCP server (StatefulSet):

Scales independently from proxy (for SSE/Streamable HTTP transports)
Stable network identities
Persistent storage can be configured if needed

Operator:

Single instance with leader election
Watches cluster-wide or namespace-scoped

Mode-Specific Implementation Details

Workloads API Abstraction

The workloads API (pkg/workloads/manager.go) provides a unified interface across all modes:

type Manager interface {
    RunWorkload(ctx context.Context, runConfig *runner.RunConfig) error
    RunWorkloadDetached(ctx context.Context, runConfig *runner.RunConfig) error
    StopWorkloads(ctx context.Context, names []string) (*errgroup.Group, error)
    DeleteWorkloads(ctx context.Context, names []string) (*errgroup.Group, error)
    ListWorkloads(ctx context.Context, listAll bool, labelFilters ...string) ([]core.Workload, error)
    GetWorkload(ctx context.Context, workloadName string) (core.Workload, error)
    // ... more methods
}

Mode-specific behavior is abstracted through:

Runtime interface (pkg/container/runtime/types.go)
Factory pattern for runtime selection (pkg/container/factory.go)

Runtime Abstraction

classDiagram
    class Runtime {
        <<interface>>
        +DeployWorkload()
        +StopWorkload()
        +RemoveWorkload()
        +ListWorkloads()
        +GetWorkloadInfo()
    }

    class DockerRuntime {
        +DeployWorkload()
        +StopWorkload()
        ...
    }

    class KubernetesRuntime {
        +DeployWorkload()
        +StopWorkload()
        ...
    }

    Runtime <|-- DockerRuntime
    Runtime <|-- KubernetesRuntime

Implementation files:

Docker: pkg/container/docker/ (implementation details in Docker engine integration)
Kubernetes: Operator uses Kubernetes API directly, not the Runtime interface

RunConfig Portability

The RunConfig format (pkg/runner/config.go) is designed to be portable across all modes:

Local → Local: Direct JSON export/import via:

thv export <workload> <output-file> → saves RunConfig JSON
thv run --from-config <file> → loads RunConfig JSON

Local → Kubernetes: Manual conversion:

Export RunConfig from local workload
Convert to MCPServer CRD YAML (tool support planned)
Apply to cluster

Kubernetes → Kubernetes: Direct CRD replication

Environment Detection

Implementation: pkg/container/runtime/types.go

ToolHive automatically detects runtime environment:

func IsKubernetesRuntime() bool {
    // Check TOOLHIVE_RUNTIME env var
    if runtimeEnv := os.Getenv("TOOLHIVE_RUNTIME"); runtimeEnv == "kubernetes" {
        return true
    }
    // Check if running in K8s pod
    return os.Getenv("KUBERNETES_SERVICE_HOST") != ""
}

This allows the same codebase to behave appropriately in different environments.

Choosing a Deployment Mode

Use Local CLI Mode When:

Developing MCP servers locally
Quick testing and iteration
Single-user environment
No need for web UI

Use Local UI Mode When:

Non-technical users need access
Visual management preferred
Local development with GUI
Multiple users on same machine (API can be shared)

Use Kubernetes Mode When:

Production deployments
Multi-tenant requirements
Need horizontal scaling
HA and resilience required
Integration with existing K8s infrastructure
Centralized management of many MCP servers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment Modes

Overview

Mode Comparison

Local Mode: CLI

Architecture

How It Works

Container Runtime Selection

Detached Process Model

File Locations

Local Mode: UI

Architecture

How It Works

API Endpoints

Differences from CLI Mode

Kubernetes Mode: Operator

Architecture

How It Works

Why Two Binaries?

Deployment Pattern

Custom Resource Definitions

Operator Design Decisions

State Management

Scaling Considerations

Mode-Specific Implementation Details

Workloads API Abstraction

Runtime Abstraction

RunConfig Portability

Environment Detection

Choosing a Deployment Mode

Use Local CLI Mode When:

Use Local UI Mode When:

Use Kubernetes Mode When:

Migration Paths

Local → Kubernetes

Kubernetes → Local

Related Documentation

FilesExpand file tree

01-deployment-modes.md

Latest commit

History

01-deployment-modes.md

File metadata and controls

Deployment Modes

Overview

Mode Comparison

Local Mode: CLI

Architecture

How It Works

Container Runtime Selection

Detached Process Model

File Locations

Local Mode: UI

Architecture

How It Works

API Endpoints

Differences from CLI Mode

Kubernetes Mode: Operator

Architecture

How It Works

Why Two Binaries?

Deployment Pattern

Custom Resource Definitions

Operator Design Decisions

State Management

Scaling Considerations

Mode-Specific Implementation Details

Workloads API Abstraction

Runtime Abstraction

RunConfig Portability

Environment Detection

Choosing a Deployment Mode

Use Local CLI Mode When:

Use Local UI Mode When:

Use Kubernetes Mode When:

Migration Paths

Local → Kubernetes

Kubernetes → Local

Related Documentation