13.3 Production Configuration

Production Configuration

Relevant source files

The following files were used as context for generating this wiki page:

This page covers production deployment configuration best practices for ZeroClaw, including security hardening, resource limits, monitoring, and operational tuning. For deployment methods, see Docker Deployment and Native Binary Deployment. For the configuration file reference, see Configuration File Reference.

Configuration Hierarchy

ZeroClaw uses a three-tier configuration system with environment variables taking precedence over config.toml, which takes precedence over built-in defaults.

Configuration Priority

flowchart LR
    A["Environment Variables"] -->|highest priority| B["config.toml"]
    B --> C["Built-in Defaults"]
    C -->|lowest priority| D["Runtime Behavior"]

Sources: README.md:492-599, src/security/secrets.rs:1-227

Key Configuration Files

File Path	Purpose	Required
`~/.zeroclaw/config.toml`	Primary configuration	Yes
`~/.zeroclaw/.secret_key`	Secret encryption key	Auto-created
`~/.zeroclaw/auth-profiles.json`	OAuth profiles (encrypted)	Optional
`workspace/MEMORY_SNAPSHOT.md`	Memory backup	Auto-generated

Sources: README.md:492-599, src/security/secrets.rs:36-51

Security Hardening

Five-Layer Security Model

flowchart TD
    Request["Incoming Request"] --> L1["Layer 1: Network Isolation"]
    L1 --> L2["Layer 2: Authentication"]
    L2 --> L3["Layer 3: Authorization"]
    L3 --> L4["Layer 4: Execution Isolation"]
    L4 --> L5["Layer 5: Data Protection"]
    
    L1 --> L1A["127.0.0.1 bind<br/>Tunnel required"]
    L2 --> L2A["PairingGuard<br/>Bearer tokens"]
    L3 --> L3A["SecurityPolicy<br/>Autonomy levels<br/>Allowlists"]
    L4 --> L4A["RuntimeAdapter<br/>Docker sandbox"]
    L5 --> L5A["SecretStore<br/>ChaCha20-Poly1305"]

Sources: README.md:380-431, src/security/pairing.rs:1-231, src/security/secrets.rs:1-227

Secret Management

ZeroClaw encrypts secrets at rest using ChaCha20-Poly1305 AEAD with a local key file.

Configuration:

[secrets]
encrypt = true  # Enable encryption (default: true)

Key File Location: ~/.zeroclaw/.secret_key (mode 0600)

Encryption Format:

Current: enc2:<hex(nonce || ciphertext || tag)> (ChaCha20-Poly1305)
Legacy: enc:<hex(xor_ciphertext)> (auto-migrates on load)

Migration Detection:

// Check if secret needs upgrade from legacy XOR
SecretStore::needs_migration(value)  // Returns true for "enc:" prefix

Production Recommendation: Always enable encryption. For compliance scenarios requiring plaintext (audit logs, SIEM integration), set encrypt = false and use external secret management (Vault, AWS Secrets Manager).

Sources: src/security/secrets.rs:1-283, README.md:556-558

Gateway Authentication

Production gateway configuration enforces two-factor protection: one-time pairing code plus bearer token.

[gateway]
port = 3000
host = "127.0.0.1"              # Localhost-only binding
require_pairing = true           # Enforce pairing (default: true)
allow_public_bind = false        # Refuse 0.0.0.0 without tunnel

Pairing Flow:

sequenceDiagram
    participant G as Gateway
    participant C as Client
    participant PG as PairingGuard
    
    G->>PG: Generate 6-digit code on startup
    G->>G: Print code to console
    C->>G: POST /pair<br/>X-Pairing-Code: 123456
    G->>PG: try_pair(code)
    PG->>PG: Verify code (constant-time compare)
    PG->>PG: Generate bearer token (zc_<64-hex>)
    PG->>PG: Hash token (SHA-256) for storage
    PG-->>C: {"token": "zc_..."}
    C->>G: POST /webhook<br/>Authorization: Bearer zc_...
    G->>PG: is_authenticated(token)
    PG->>PG: Compare SHA-256(token) against stored hash
    PG-->>G: true/false

Sources: src/security/pairing.rs:36-151, README.md:525-530

Brute Force Protection:

Max attempts: 5 failed pairing attempts
Lockout duration: 300 seconds (5 minutes)
Token storage: SHA-256 hashes only (no plaintext)

Production Best Practice: Rotate bearer tokens periodically using zeroclaw auth commands.

Sources: src/security/pairing.rs:16-36

Channel Allowlists

Production deployments should use explicit allowlists for all channels to prevent unauthorized access.

Default Behavior: Empty allowlist = deny all inbound messages

[channels_config.telegram]
allowed_users = ["alice_username", "123456789"]  # Username or numeric ID

[channels_config.discord]
allowed_users = ["987654321098765432"]  # Discord user ID

[channels_config.slack]
allowed_users = ["U01234ABC"]  # Slack member ID

Wildcard (Testing Only):

allowed_users = ["*"]  # ⚠️ Allow all — use only for temporary testing

Sources: README.md:394-438

Filesystem Scoping

[autonomy]
workspace_only = true  # Restrict to workspace directory (default: true)
forbidden_paths = [
    "/etc", "/root", "/proc", "/sys",
    "~/.ssh", "~/.gnupg", "~/.aws"
]

Built-in Protections:

14 system directories blocked by default
Null byte injection detection
Symlink escape prevention via path canonicalization

Sources: README.md:385-390

Resource Management

Docker Resource Constraints

Production docker-compose.yml example with resource limits:

services:
  zeroclaw:
    image: ghcr.io/zeroclaw-labs/zeroclaw:latest
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '0.5'
          memory: 512M

Sources: docker-compose.yml:42-50

Runtime Adapter Settings

[runtime]
kind = "docker"  # "native" or "docker"

[runtime.docker]
image = "alpine:3.20"
network = "none"              # Isolate from network
memory_limit_mb = 512         # Hard limit
cpu_limit = 1.0               # CPU shares (1.0 = 1 core)
read_only_rootfs = true       # Immutable root filesystem
mount_workspace = true        # Mount workspace at /workspace

Production Recommendation: Use docker runtime for untrusted tool execution. Use native for trusted environments where Docker overhead is unacceptable.

Sources: README.md:537-548

Memory Backend Selection

graph LR
    A["Workload Type"] --> B{"Data Volume"}
    B -->|< 10k memories| C["sqlite<br/>Full-stack search<br/>Hybrid vector + keyword"]
    B -->|> 100k memories| D["postgres<br/>Remote persistence<br/>Multi-agent shared state"]
    B -->|Ephemeral| E["none<br/>No-op backend<br/>Stateless operation"]

Sources: README.md:330-377

Configuration:

[memory]
backend = "sqlite"            # "sqlite", "postgres", "lucid", "markdown", "none"
auto_save = true              # Auto-persist conversations

# PostgreSQL example
[storage.provider.config]
provider = "postgres"
db_url = "postgres://user:pass@host:5432/zeroclaw"
schema = "public"
table = "memories"
connect_timeout_secs = 15

Sources: README.md:346-365

Persistence Strategy

Memory Snapshot System

ZeroClaw exports MemoryCategory::Core to MEMORY_SNAPSHOT.md for Git visibility and disaster recovery.

flowchart TD
    A["Agent Startup"] --> B{"brain.db exists?"}
    B -->|No| C{"MEMORY_SNAPSHOT.md exists?"}
    C -->|Yes| D["hydrate_from_snapshot()"]
    D --> E["Recreate brain.db from snapshot"]
    E --> F["Normal operation"]
    B -->|Yes| F
    C -->|No| F
    
    F --> G["Periodic export_snapshot()"]
    G --> H["Write core memories to<br/>MEMORY_SNAPSHOT.md"]

Sources: src/memory/snapshot.rs:1-471

Key Functions:

Function	Purpose	Trigger
`export_snapshot()`	Export core memories to Markdown	Manual / on-shutdown
`hydrate_from_snapshot()`	Restore from Markdown to SQLite	Auto on cold-boot if DB missing
`should_hydrate()`	Check if hydration needed	Startup check

File Locations:

Snapshot: <workspace>/MEMORY_SNAPSHOT.md
Database: <workspace>/memory/brain.db

Sources: src/memory/snapshot.rs:26-200

Backup Strategy

Production Checklist:

Mount persistent volumes:

volumes:
  - zeroclaw-data:/zeroclaw-data  # Must match WORKDIR in Dockerfile

Periodic backup:

# Backup entire workspace
tar -czf zeroclaw-backup-$(date +%F).tar.gz ~/.zeroclaw/workspace

# Or backup SQLite directly
sqlite3 ~/.zeroclaw/workspace/memory/brain.db ".backup brain-$(date +%F).db"

Git-track snapshot:

cd ~/.zeroclaw/workspace
git add MEMORY_SNAPSHOT.md
git commit -m "Memory snapshot $(date +%F)"

Sources: docker-compose.yml:34-36, src/memory/snapshot.rs:17-90

Monitoring & Observability

Health Check Endpoints

graph LR
    A["Health Check"] --> B["/health endpoint"]
    A --> C["zeroclaw status"]
    A --> D["zeroclaw doctor"]
    
    B --> B1["Always public<br/>No authentication"]
    C --> C1["System status<br/>Config validation"]
    D --> D1["Deep diagnostics<br/>Channel health"]

Sources: README.md:225-233

HTTP Health Check:

curl -f http://localhost:3000/health
# Returns: {"status": "ok"}

Docker Compose Health Check:

healthcheck:
  test: ["CMD", "zeroclaw", "status"]
  interval: 60s
  timeout: 10s
  retries: 3
  start_period: 10s

Sources: docker-compose.yml:53-59

Diagnostics Commands

Command	Purpose	Output
`zeroclaw status`	Overall system health	Config paths, provider, memory backend
`zeroclaw doctor`	Deep diagnostics	Daemon freshness, scheduler status
`zeroclaw channel doctor`	Channel health	Per-channel reachability, auth status
`zeroclaw auth status`	OAuth status	Profile validity, token expiry

Sources: README.md:225-233

Logging Configuration

Tracing Levels via Environment:

export RUST_LOG=zeroclaw=info,zeroclaw::gateway=debug

Log Targets:

Module	Key Events
`zeroclaw::gateway`	Request handling, pairing, rate limiting
`zeroclaw::channels`	Message ingestion, allowlist checks
`zeroclaw::security`	Authorization decisions, policy violations
`zeroclaw::memory`	Snapshot export/hydrate, query performance

Production Recommendation: Use structured logging (JSON) for SIEM integration:

// tracing-subscriber with JSON formatter
tracing_subscriber::fmt()
    .json()
    .with_env_filter(EnvFilter::from_default_env())
    .init();

Sources: Cargo.toml:40-42

Metrics (Prometheus)

ZeroClaw includes prometheus crate for metrics export.

Configuration:

# Future: metrics endpoint configuration
[observability]
metrics_enabled = true
metrics_port = 9090

Available Metrics (from code structure):

Request counts by endpoint
Rate limit violations
Provider API call latency
Memory operation latency

Sources: Cargo.toml:45

Performance Tuning

Build Profiles

graph TD
    A["Build Target"] --> B{"Build Profile"}
    B -->|Development| C["cargo build"]
    B -->|Production| D["cargo build --release"]
    B -->|High-memory machines| E["cargo build --profile release-fast"]
    
    C --> C1["Debug symbols<br/>No optimization<br/>Fast compile"]
    D --> D1["opt-level=z<br/>codegen-units=1<br/>3.4 MB binary"]
    E --> E1["opt-level=z<br/>codegen-units=8<br/>Faster compile"]

Sources: Cargo.toml:161-173

Production Build:

cargo build --release --locked
# Binary size: ~8.8 MB on macOS arm64 (measured Feb 2026)
# Memory footprint: ~4-5 MB for common CLI operations

Docker Multi-Stage Build:

# Stage 1: Builder (cached dependencies)
FROM rust:1.93-slim AS builder
COPY Cargo.toml Cargo.lock ./
RUN cargo build --release --locked

# Stage 2: Production runtime (distroless)
FROM gcr.io/distroless/cc-debian13:nonroot AS release
COPY --from=builder /app/zeroclaw /usr/local/bin/zeroclaw

Sources: Dockerfile:1-113, README.md:63-98

Runtime Optimization

Memory Backend Performance:

Backend	Query Latency	Write Latency	Storage	Use Case
`sqlite`	~2ms (FTS5)	~5ms	Local file	Single-agent, full search
`postgres`	~10ms (network)	~15ms	Remote DB	Multi-agent, shared state
`markdown`	~1ms (grep)	~0.5ms	`.md` files	Human-readable, Git-tracked
`none`	0ms	0ms	None	Stateless, ephemeral

Sources: README.md:330-377

Provider Resilience:

# ReliableProvider wraps all providers with retry logic
[provider]
max_retries = 3
backoff_multiplier = 2.0
timeout_secs = 60

Key resilience features:

Exponential backoff on transient errors
API key rotation (multiple keys in env)
Model fallback (default_model → fallback_model)

Sources: Per architecture diagrams, providers module structure

Browser Backend Selection

[browser]
backend = "auto"  # "agent_browser", "rust_native", "computer_use", "auto"

Performance Comparison:

Backend	Startup	Memory	Availability
`agent_browser`	~200ms (Node.js)	~100 MB	npm install
`rust_native`	~50ms	~30 MB	cargo build --features browser-native
`computer_use`	~10ms (sidecar)	~50 MB	External sidecar

Sources: src/tools/browser.rs:1-700

Network Configuration

Gateway Binding

[gateway]
host = "127.0.0.1"           # Localhost-only (production default)
port = 3000
allow_public_bind = false    # Refuse 0.0.0.0 without tunnel

Public Bind Protection:

// Gateway refuses 0.0.0.0 unless tunnel active or explicit override
if is_public_bind(&host) && !tunnel_active && !allow_public_bind {
    anyhow::bail!("Refusing public bind without tunnel");
}

Sources: src/security/pairing.rs:225-230, README.md:387-391

Tunnel Requirements

Production deployments must use tunnels for remote access:

[tunnel]
provider = "cloudflare"  # "cloudflare", "tailscale", "ngrok", "custom"

Tunnel Matrix:

Provider	Transport	Use Case
Cloudflare	HTTPS	Public webhook endpoints (WhatsApp, etc.)
Tailscale	Wireguard	Private mesh networks
ngrok	HTTPS	Development, temporary exposure
Custom	Any	Custom tunnel binary

HTTPS Enforcement:

WhatsApp webhook: Requires HTTPS (Meta Cloud API validation)
Pairing over public net: Bearer tokens should only traverse HTTPS

Sources: README.md:456-491

Port Mapping

# docker-compose.yml
ports:
  - "${HOST_PORT:-3000}:3000"  # Override with HOST_PORT=8080

Production Recommendation: Use non-standard ports (e.g., 8443) to reduce automated scanner noise.

Sources: docker-compose.yml:38-40

Scaling Considerations

Horizontal Scaling

ZeroClaw stateless design enables horizontal scaling with shared backend:

graph TD
    LB["Load Balancer"] --> G1["Gateway Instance 1"]
    LB --> G2["Gateway Instance 2"]
    LB --> G3["Gateway Instance 3"]
    
    G1 --> PG["PostgreSQL<br/>Shared Memory"]
    G2 --> PG
    G3 --> PG
    
    G1 --> RD["Redis<br/>Rate Limit State"]
    G2 --> RD
    G3 --> RD

Configuration:

[memory]
backend = "postgres"
[storage.provider.config]
provider = "postgres"
db_url = "postgres://shared-db:5432/zeroclaw"

# Rate limiting requires shared state (not yet implemented)
# Future: Redis adapter for distributed rate limiting

Sources: README.md:346-365

Resource Scaling

Single-agent optimal configuration:

CPU: 1-2 cores
Memory: 512 MB - 2 GB (depends on memory backend size)
Storage: 1 GB (SQLite + workspace)

Multi-agent coordinator configuration:

CPU: 4+ cores (parallel tool execution)
Memory: 4-8 GB (multiple sub-agent contexts)
Storage: 10 GB+ (large conversation histories)

Sources: docker-compose.yml:42-50, README.md:63-74

Production Deployment Checklist

Security

Enable secret encryption (secrets.encrypt = true)
Enable gateway pairing (gateway.require_pairing = true)
Configure channel allowlists (no ["*"] wildcards)
Enable workspace scoping (autonomy.workspace_only = true)
Use Docker runtime for untrusted tools (runtime.kind = "docker")
Configure tunnel provider (tunnel.provider)
Restrict forbidden paths (autonomy.forbidden_paths)

Resource Management

Set CPU limits (deploy.resources.limits.cpus)
Set memory limits (deploy.resources.limits.memory)
Configure runtime constraints (runtime.docker.memory_limit_mb)
Select appropriate memory backend (memory.backend)

Persistence

Mount persistent volumes (volumes: zeroclaw-data:/zeroclaw-data)
Schedule backups (SQLite + MEMORY_SNAPSHOT.md)
Git-track workspace for version control
Test restore procedure

Monitoring

Configure health checks (healthcheck.test)
Set up structured logging (RUST_LOG)
Enable metrics endpoint (when available)
Configure alerting on health check failures

Performance

Build with --release --locked
Use appropriate build profile (release vs release-fast)
Tune provider timeout (provider.timeout_secs)
Select optimal browser backend (browser.backend)

Network

Bind to localhost (gateway.host = "127.0.0.1")
Configure tunnel (tunnel.provider)
Use non-standard ports in production
Enforce HTTPS for public endpoints

Sources: All sections above

Home

13.3 Production Configuration

Production Configuration

Configuration Hierarchy

Configuration Priority

Key Configuration Files

Security Hardening

Five-Layer Security Model

Secret Management

Gateway Authentication

Channel Allowlists

Filesystem Scoping

Resource Management

Docker Resource Constraints

Runtime Adapter Settings

Memory Backend Selection

Persistence Strategy

Memory Snapshot System

Backup Strategy

Monitoring & Observability

Health Check Endpoints

Diagnostics Commands

Logging Configuration

Metrics (Prometheus)

Performance Tuning

Build Profiles

Runtime Optimization

Browser Backend Selection

Network Configuration

Gateway Binding

Tunnel Requirements

Port Mapping

Scaling Considerations

Horizontal Scaling

Resource Scaling

Production Deployment Checklist

Security

Resource Management

Persistence

Monitoring

Performance

Network

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!