-
Notifications
You must be signed in to change notification settings - Fork 4.4k
13.3 Production Configuration
Relevant source files
The following files were used as context for generating this wiki page:
This page covers production deployment configuration best practices for ZeroClaw, including security hardening, resource limits, monitoring, and operational tuning. For deployment methods, see Docker Deployment and Native Binary Deployment. For the configuration file reference, see Configuration File Reference.
ZeroClaw uses a three-tier configuration system with environment variables taking precedence over config.toml, which takes precedence over built-in defaults.
flowchart LR
A["Environment Variables"] -->|highest priority| B["config.toml"]
B --> C["Built-in Defaults"]
C -->|lowest priority| D["Runtime Behavior"]
Sources: README.md:492-599, src/security/secrets.rs:1-227
| File Path | Purpose | Required |
|---|---|---|
~/.zeroclaw/config.toml |
Primary configuration | Yes |
~/.zeroclaw/.secret_key |
Secret encryption key | Auto-created |
~/.zeroclaw/auth-profiles.json |
OAuth profiles (encrypted) | Optional |
workspace/MEMORY_SNAPSHOT.md |
Memory backup | Auto-generated |
Sources: README.md:492-599, src/security/secrets.rs:36-51
flowchart TD
Request["Incoming Request"] --> L1["Layer 1: Network Isolation"]
L1 --> L2["Layer 2: Authentication"]
L2 --> L3["Layer 3: Authorization"]
L3 --> L4["Layer 4: Execution Isolation"]
L4 --> L5["Layer 5: Data Protection"]
L1 --> L1A["127.0.0.1 bind<br/>Tunnel required"]
L2 --> L2A["PairingGuard<br/>Bearer tokens"]
L3 --> L3A["SecurityPolicy<br/>Autonomy levels<br/>Allowlists"]
L4 --> L4A["RuntimeAdapter<br/>Docker sandbox"]
L5 --> L5A["SecretStore<br/>ChaCha20-Poly1305"]
Sources: README.md:380-431, src/security/pairing.rs:1-231, src/security/secrets.rs:1-227
ZeroClaw encrypts secrets at rest using ChaCha20-Poly1305 AEAD with a local key file.
Configuration:
[secrets]
encrypt = true # Enable encryption (default: true)Key File Location: ~/.zeroclaw/.secret_key (mode 0600)
Encryption Format:
- Current:
enc2:<hex(nonce || ciphertext || tag)>(ChaCha20-Poly1305) - Legacy:
enc:<hex(xor_ciphertext)>(auto-migrates on load)
Migration Detection:
// Check if secret needs upgrade from legacy XOR
SecretStore::needs_migration(value) // Returns true for "enc:" prefixProduction Recommendation: Always enable encryption. For compliance scenarios requiring plaintext (audit logs, SIEM integration), set encrypt = false and use external secret management (Vault, AWS Secrets Manager).
Sources: src/security/secrets.rs:1-283, README.md:556-558
Production gateway configuration enforces two-factor protection: one-time pairing code plus bearer token.
[gateway]
port = 3000
host = "127.0.0.1" # Localhost-only binding
require_pairing = true # Enforce pairing (default: true)
allow_public_bind = false # Refuse 0.0.0.0 without tunnelPairing Flow:
sequenceDiagram
participant G as Gateway
participant C as Client
participant PG as PairingGuard
G->>PG: Generate 6-digit code on startup
G->>G: Print code to console
C->>G: POST /pair<br/>X-Pairing-Code: 123456
G->>PG: try_pair(code)
PG->>PG: Verify code (constant-time compare)
PG->>PG: Generate bearer token (zc_<64-hex>)
PG->>PG: Hash token (SHA-256) for storage
PG-->>C: {"token": "zc_..."}
C->>G: POST /webhook<br/>Authorization: Bearer zc_...
G->>PG: is_authenticated(token)
PG->>PG: Compare SHA-256(token) against stored hash
PG-->>G: true/false
Sources: src/security/pairing.rs:36-151, README.md:525-530
Brute Force Protection:
- Max attempts: 5 failed pairing attempts
- Lockout duration: 300 seconds (5 minutes)
- Token storage: SHA-256 hashes only (no plaintext)
Production Best Practice: Rotate bearer tokens periodically using zeroclaw auth commands.
Sources: src/security/pairing.rs:16-36
Production deployments should use explicit allowlists for all channels to prevent unauthorized access.
Default Behavior: Empty allowlist = deny all inbound messages
[channels_config.telegram]
allowed_users = ["alice_username", "123456789"] # Username or numeric ID
[channels_config.discord]
allowed_users = ["987654321098765432"] # Discord user ID
[channels_config.slack]
allowed_users = ["U01234ABC"] # Slack member IDWildcard (Testing Only):
allowed_users = ["*"] # ⚠️ Allow all — use only for temporary testingSources: README.md:394-438
[autonomy]
workspace_only = true # Restrict to workspace directory (default: true)
forbidden_paths = [
"/etc", "/root", "/proc", "/sys",
"~/.ssh", "~/.gnupg", "~/.aws"
]Built-in Protections:
- 14 system directories blocked by default
- Null byte injection detection
- Symlink escape prevention via path canonicalization
Sources: README.md:385-390
Production docker-compose.yml example with resource limits:
services:
zeroclaw:
image: ghcr.io/zeroclaw-labs/zeroclaw:latest
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '0.5'
memory: 512MSources: docker-compose.yml:42-50
[runtime]
kind = "docker" # "native" or "docker"
[runtime.docker]
image = "alpine:3.20"
network = "none" # Isolate from network
memory_limit_mb = 512 # Hard limit
cpu_limit = 1.0 # CPU shares (1.0 = 1 core)
read_only_rootfs = true # Immutable root filesystem
mount_workspace = true # Mount workspace at /workspaceProduction Recommendation: Use docker runtime for untrusted tool execution. Use native for trusted environments where Docker overhead is unacceptable.
Sources: README.md:537-548
graph LR
A["Workload Type"] --> B{"Data Volume"}
B -->|< 10k memories| C["sqlite<br/>Full-stack search<br/>Hybrid vector + keyword"]
B -->|> 100k memories| D["postgres<br/>Remote persistence<br/>Multi-agent shared state"]
B -->|Ephemeral| E["none<br/>No-op backend<br/>Stateless operation"]
Sources: README.md:330-377
Configuration:
[memory]
backend = "sqlite" # "sqlite", "postgres", "lucid", "markdown", "none"
auto_save = true # Auto-persist conversations
# PostgreSQL example
[storage.provider.config]
provider = "postgres"
db_url = "postgres://user:pass@host:5432/zeroclaw"
schema = "public"
table = "memories"
connect_timeout_secs = 15Sources: README.md:346-365
ZeroClaw exports MemoryCategory::Core to MEMORY_SNAPSHOT.md for Git visibility and disaster recovery.
flowchart TD
A["Agent Startup"] --> B{"brain.db exists?"}
B -->|No| C{"MEMORY_SNAPSHOT.md exists?"}
C -->|Yes| D["hydrate_from_snapshot()"]
D --> E["Recreate brain.db from snapshot"]
E --> F["Normal operation"]
B -->|Yes| F
C -->|No| F
F --> G["Periodic export_snapshot()"]
G --> H["Write core memories to<br/>MEMORY_SNAPSHOT.md"]
Sources: src/memory/snapshot.rs:1-471
Key Functions:
| Function | Purpose | Trigger |
|---|---|---|
export_snapshot() |
Export core memories to Markdown | Manual / on-shutdown |
hydrate_from_snapshot() |
Restore from Markdown to SQLite | Auto on cold-boot if DB missing |
should_hydrate() |
Check if hydration needed | Startup check |
File Locations:
- Snapshot:
<workspace>/MEMORY_SNAPSHOT.md - Database:
<workspace>/memory/brain.db
Sources: src/memory/snapshot.rs:26-200
Production Checklist:
-
Mount persistent volumes:
volumes: - zeroclaw-data:/zeroclaw-data # Must match WORKDIR in Dockerfile
-
Periodic backup:
# Backup entire workspace tar -czf zeroclaw-backup-$(date +%F).tar.gz ~/.zeroclaw/workspace # Or backup SQLite directly sqlite3 ~/.zeroclaw/workspace/memory/brain.db ".backup brain-$(date +%F).db"
-
Git-track snapshot:
cd ~/.zeroclaw/workspace git add MEMORY_SNAPSHOT.md git commit -m "Memory snapshot $(date +%F)"
Sources: docker-compose.yml:34-36, src/memory/snapshot.rs:17-90
graph LR
A["Health Check"] --> B["/health endpoint"]
A --> C["zeroclaw status"]
A --> D["zeroclaw doctor"]
B --> B1["Always public<br/>No authentication"]
C --> C1["System status<br/>Config validation"]
D --> D1["Deep diagnostics<br/>Channel health"]
Sources: README.md:225-233
HTTP Health Check:
curl -f http://localhost:3000/health
# Returns: {"status": "ok"}Docker Compose Health Check:
healthcheck:
test: ["CMD", "zeroclaw", "status"]
interval: 60s
timeout: 10s
retries: 3
start_period: 10sSources: docker-compose.yml:53-59
| Command | Purpose | Output |
|---|---|---|
zeroclaw status |
Overall system health | Config paths, provider, memory backend |
zeroclaw doctor |
Deep diagnostics | Daemon freshness, scheduler status |
zeroclaw channel doctor |
Channel health | Per-channel reachability, auth status |
zeroclaw auth status |
OAuth status | Profile validity, token expiry |
Sources: README.md:225-233
Tracing Levels via Environment:
export RUST_LOG=zeroclaw=info,zeroclaw::gateway=debugLog Targets:
| Module | Key Events |
|---|---|
zeroclaw::gateway |
Request handling, pairing, rate limiting |
zeroclaw::channels |
Message ingestion, allowlist checks |
zeroclaw::security |
Authorization decisions, policy violations |
zeroclaw::memory |
Snapshot export/hydrate, query performance |
Production Recommendation: Use structured logging (JSON) for SIEM integration:
// tracing-subscriber with JSON formatter
tracing_subscriber::fmt()
.json()
.with_env_filter(EnvFilter::from_default_env())
.init();Sources: Cargo.toml:40-42
ZeroClaw includes prometheus crate for metrics export.
Configuration:
# Future: metrics endpoint configuration
[observability]
metrics_enabled = true
metrics_port = 9090Available Metrics (from code structure):
- Request counts by endpoint
- Rate limit violations
- Provider API call latency
- Memory operation latency
Sources: Cargo.toml:45
graph TD
A["Build Target"] --> B{"Build Profile"}
B -->|Development| C["cargo build"]
B -->|Production| D["cargo build --release"]
B -->|High-memory machines| E["cargo build --profile release-fast"]
C --> C1["Debug symbols<br/>No optimization<br/>Fast compile"]
D --> D1["opt-level=z<br/>codegen-units=1<br/>3.4 MB binary"]
E --> E1["opt-level=z<br/>codegen-units=8<br/>Faster compile"]
Sources: Cargo.toml:161-173
Production Build:
cargo build --release --locked
# Binary size: ~8.8 MB on macOS arm64 (measured Feb 2026)
# Memory footprint: ~4-5 MB for common CLI operationsDocker Multi-Stage Build:
# Stage 1: Builder (cached dependencies)
FROM rust:1.93-slim AS builder
COPY Cargo.toml Cargo.lock ./
RUN cargo build --release --locked
# Stage 2: Production runtime (distroless)
FROM gcr.io/distroless/cc-debian13:nonroot AS release
COPY --from=builder /app/zeroclaw /usr/local/bin/zeroclawSources: Dockerfile:1-113, README.md:63-98
Memory Backend Performance:
| Backend | Query Latency | Write Latency | Storage | Use Case |
|---|---|---|---|---|
sqlite |
~2ms (FTS5) | ~5ms | Local file | Single-agent, full search |
postgres |
~10ms (network) | ~15ms | Remote DB | Multi-agent, shared state |
markdown |
~1ms (grep) | ~0.5ms |
.md files |
Human-readable, Git-tracked |
none |
0ms | 0ms | None | Stateless, ephemeral |
Sources: README.md:330-377
Provider Resilience:
# ReliableProvider wraps all providers with retry logic
[provider]
max_retries = 3
backoff_multiplier = 2.0
timeout_secs = 60Key resilience features:
- Exponential backoff on transient errors
- API key rotation (multiple keys in env)
- Model fallback (
default_model→fallback_model)
Sources: Per architecture diagrams, providers module structure
[browser]
backend = "auto" # "agent_browser", "rust_native", "computer_use", "auto"Performance Comparison:
| Backend | Startup | Memory | Availability |
|---|---|---|---|
agent_browser |
~200ms (Node.js) | ~100 MB | npm install |
rust_native |
~50ms | ~30 MB | cargo build --features browser-native |
computer_use |
~10ms (sidecar) | ~50 MB | External sidecar |
Sources: src/tools/browser.rs:1-700
[gateway]
host = "127.0.0.1" # Localhost-only (production default)
port = 3000
allow_public_bind = false # Refuse 0.0.0.0 without tunnelPublic Bind Protection:
// Gateway refuses 0.0.0.0 unless tunnel active or explicit override
if is_public_bind(&host) && !tunnel_active && !allow_public_bind {
anyhow::bail!("Refusing public bind without tunnel");
}Sources: src/security/pairing.rs:225-230, README.md:387-391
Production deployments must use tunnels for remote access:
[tunnel]
provider = "cloudflare" # "cloudflare", "tailscale", "ngrok", "custom"Tunnel Matrix:
| Provider | Transport | Use Case |
|---|---|---|
| Cloudflare | HTTPS | Public webhook endpoints (WhatsApp, etc.) |
| Tailscale | Wireguard | Private mesh networks |
| ngrok | HTTPS | Development, temporary exposure |
| Custom | Any | Custom tunnel binary |
HTTPS Enforcement:
- WhatsApp webhook: Requires HTTPS (Meta Cloud API validation)
- Pairing over public net: Bearer tokens should only traverse HTTPS
Sources: README.md:456-491
# docker-compose.yml
ports:
- "${HOST_PORT:-3000}:3000" # Override with HOST_PORT=8080Production Recommendation: Use non-standard ports (e.g., 8443) to reduce automated scanner noise.
Sources: docker-compose.yml:38-40
ZeroClaw stateless design enables horizontal scaling with shared backend:
graph TD
LB["Load Balancer"] --> G1["Gateway Instance 1"]
LB --> G2["Gateway Instance 2"]
LB --> G3["Gateway Instance 3"]
G1 --> PG["PostgreSQL<br/>Shared Memory"]
G2 --> PG
G3 --> PG
G1 --> RD["Redis<br/>Rate Limit State"]
G2 --> RD
G3 --> RD
Configuration:
[memory]
backend = "postgres"
[storage.provider.config]
provider = "postgres"
db_url = "postgres://shared-db:5432/zeroclaw"
# Rate limiting requires shared state (not yet implemented)
# Future: Redis adapter for distributed rate limitingSources: README.md:346-365
Single-agent optimal configuration:
- CPU: 1-2 cores
- Memory: 512 MB - 2 GB (depends on memory backend size)
- Storage: 1 GB (SQLite + workspace)
Multi-agent coordinator configuration:
- CPU: 4+ cores (parallel tool execution)
- Memory: 4-8 GB (multiple sub-agent contexts)
- Storage: 10 GB+ (large conversation histories)
Sources: docker-compose.yml:42-50, README.md:63-74
- Enable secret encryption (
secrets.encrypt = true) - Enable gateway pairing (
gateway.require_pairing = true) - Configure channel allowlists (no
["*"]wildcards) - Enable workspace scoping (
autonomy.workspace_only = true) - Use Docker runtime for untrusted tools (
runtime.kind = "docker") - Configure tunnel provider (
tunnel.provider) - Restrict forbidden paths (
autonomy.forbidden_paths)
- Set CPU limits (
deploy.resources.limits.cpus) - Set memory limits (
deploy.resources.limits.memory) - Configure runtime constraints (
runtime.docker.memory_limit_mb) - Select appropriate memory backend (
memory.backend)
- Mount persistent volumes (
volumes: zeroclaw-data:/zeroclaw-data) - Schedule backups (SQLite +
MEMORY_SNAPSHOT.md) - Git-track workspace for version control
- Test restore procedure
- Configure health checks (
healthcheck.test) - Set up structured logging (
RUST_LOG) - Enable metrics endpoint (when available)
- Configure alerting on health check failures
- Build with
--release --locked - Use appropriate build profile (
releasevsrelease-fast) - Tune provider timeout (
provider.timeout_secs) - Select optimal browser backend (
browser.backend)
- Bind to localhost (
gateway.host = "127.0.0.1") - Configure tunnel (
tunnel.provider) - Use non-standard ports in production
- Enforce HTTPS for public endpoints
Sources: All sections above