Skip to content

Latest commit

 

History

History
577 lines (416 loc) · 12 KB

File metadata and controls

577 lines (416 loc) · 12 KB

Deployment Guide

Complete guide for deploying Perpendicularity in production environments.


📋 Table of Contents


🎯 Overview

Perpendicularity supports multiple deployment strategies:

  • Docker - Containerized deployment (recommended)
  • Local - Direct installation with uv/pip
  • AWS EC2 - Cloud deployment with GPU support
  • Kubernetes - Orchestrated deployment (advanced)

Recommended Stack:

  • Production: Docker on EC2 with Ollama
  • Development: Local installation with cloud models
  • Enterprise: Kubernetes with load balancing

🐳 Docker Deployment

Multi-Stage Build

The Dockerfile uses a multi-stage build to optimize image size and support optional local models.

Build Stages:

  1. Frontend Builder - Build React app
  2. Python Backend - Install Python dependencies
  3. Local Models (optional) - Install transformers for HuggingFace models

Build Without Local Models (Recommended)

For cloud models only (Gemini, Claude, Ollama via network):

# Build image (no local models)
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .

# Image size: ~1.5GB

What's included:

  • ✅ Python runtime
  • ✅ Core dependencies (FastAPI, Click, LangChain)
  • ✅ Frontend (React app)
  • ✅ API server
  • ❌ PyTorch and Transformers (not needed for cloud models)

Use when:

  • Using Gemini or Claude (cloud APIs)
  • Using Ollama (separate service)
  • Want smaller Docker images
  • Don't need HuggingFace Transformers

Build With Local Models

For HuggingFace Transformers (models loaded directly in container):

# Build with local models support
docker buildx build --platform linux/amd64 \
  --build-arg INSTALL_LOCAL_MODELS=true \
  -t perpendicularity:0.1.0-gpu .

# Image size: ~8GB (includes PyTorch, CUDA libraries)

What's included:

  • ✅ Everything from base build
  • ✅ PyTorch with CUDA support
  • ✅ Transformers library
  • ✅ GPU acceleration libraries

Use when:

  • Loading models with HuggingFace Transformers
  • Running models inside container (not via Ollama)
  • Need full offline capability
  • Have GPU available

Dockerfile Structure

# Stage 1: Build frontend
FROM node:20-alpine AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci
COPY frontend/ .
RUN npm run build

# Stage 2: Python dependencies
FROM python:3.11-slim AS base

# Install system dependencies
RUN apt-get update && apt-get install -y \
    curl \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install uv
RUN pip install uv

WORKDIR /app

# Copy Python project files
COPY pyproject.toml uv.lock ./
COPY agent/ agent/
COPY cli/ cli/
COPY api/ api/
COPY config/ config/

# Install Python dependencies (without local-models)
RUN uv sync --extra api

# Stage 3: Add local models support (conditional)
FROM base AS local-models
ARG INSTALL_LOCAL_MODELS=false

# Install transformers and PyTorch only if requested
RUN if [ "$INSTALL_LOCAL_MODELS" = "true" ]; then \
      uv sync --extra local-models --extra api; \
    fi

# Final stage
FROM local-models AS final

# Copy frontend build
COPY --from=frontend-builder /app/frontend/dist api/static

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/api/health || exit 1

# Run API server
CMD ["perpendicularity", "api", "--host", "0.0.0.0", "--port", "8000"]

Running the Container

Basic Run (Cloud Models)

# Run with default config
docker run -d \
  --name perpendicularity \
  -p 8000:8000 \
  -e GOOGLE_API_KEY="${GOOGLE_API_KEY}" \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY}" \
  perpendicularity:0.1.0

# Access at http://localhost:8000

Run with Custom Config

# Run with custom config file
docker run -d \
  --name perpendicularity \
  -p 8000:8000 \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  -e GOOGLE_API_KEY="${GOOGLE_API_KEY}" \
  perpendicularity:0.1.0 \
  perpendicularity api --config /app/config/agent_config.yaml

Run with Ollama (Host Network)

# Use host network to access Ollama at localhost:11434
docker run -d \
  --name perpendicularity \
  --network host \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  perpendicularity:0.1.0 \
  perpendicularity api --config /app/config/agent_config.yaml

# Access at http://localhost:8000

Why --network host?

  • Container can access Ollama at localhost:11434
  • Simpler than bridge networking for local services
  • Container ports bind directly to host

Run with GPU (Local Models)

# Run with GPU access for HuggingFace models
docker run -d \
  --name perpendicularity \
  --gpus all \
  -p 8000:8000 \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  perpendicularity:0.1.0-gpu \
  perpendicularity api --config /app/config/agent_config.yaml

# Verify GPU access
docker exec perpendicularity nvidia-smi

Requirements:

  • NVIDIA GPU
  • nvidia-docker2 installed
  • Image built with --build-arg INSTALL_LOCAL_MODELS=true

Advanced Docker Options

Build Optimization

# Use BuildKit for faster builds
DOCKER_BUILDKIT=1 docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .

# Multi-platform build (for deployment on different architectures)
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t perpendicularity:0.1.0 .

# Build with specific Python version
docker buildx build --platform linux/amd64 \
  --build-arg PYTHON_VERSION=3.11 \
  -t perpendicularity:0.1.0 .

Cache Management

# Build with no cache
docker buildx build --platform linux/amd64 --no-cache -t perpendicularity:0.1.0 .

# Use BuildKit cache
docker buildx build --platform linux/amd64 \
  --cache-from perpendicularity:latest \
  -t perpendicularity:0.1.0 .

Resource Limits

# Limit CPU and memory
docker run -d \
  --name perpendicularity \
  --cpus="2.0" \
  --memory="4g" \
  --memory-swap="4g" \
  -p 8000:8000 \
  perpendicularity:0.1.0

💻 Local Deployment

Production Installation

Using uv (Recommended):

# Clone repository
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity

# Install with production extras
uv sync --extra api

uv run perpendicularity --help

☁️ Cloud Deployment

AWS EC2 (Recommended)

Complete EC2 setup with Ollama - see EC2 Setup Guide for detailed instructions.

Quick Overview:

# 1. Launch EC2 instance (g5.xlarge for GPU)
# 2. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b-instruct

# 3. Deploy Perpendicularity
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .
docker run -d --network host \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  perpendicularity:0.1.0

# 4. Access at http://EC2_PUBLIC_IP:8000

⚙️ Environment Configuration

Environment Variables

Required for cloud models:

# Gemini API
export GOOGLE_API_KEY="AIza..."

# Claude API
export ANTHROPIC_API_KEY="sk-ant-..."

# Optional: Override config path
export PERPENDICULARITY_CONFIG="/path/to/config.yaml"

Configuration Files

Production config structure:

config/
├── agent_config.yaml          # Main configuration
├── agent_config.prod.yaml     # Production overrides
├── agent_config.dev.yaml      # Development
└── prompts.yaml               # System prompts

Production config example:

# config/agent_config.prod.yaml

default_model: "ollama_qwen32b"  # High-quality local model

models:
  defaults:
    openai:
      base_url: "http://localhost:11434/v1"
  
  # Production-grade local model
  ollama_qwen32b:
    type: "openai"
    name: "qwen2.5:32b-instruct"

agent:
  type: "langgraph"
  recursion_limit: 25
  verbose: false  # Disable verbose in production

mcp_servers:
  genomic_ops:
    url: "http://genomic-server.internal:8000/mcp"
    timeout: 180
  
  txgemma:
    url: "http://txgemma-server.internal:8001/mcp"
    timeout: 180

logging:
  level: "WARNING"  # Only warnings and errors

Environment-Specific Deployment

# Development
perpendicularity api --config config/agent_config.dev.yaml

# Staging
perpendicularity api --config config/agent_config.staging.yaml

# Production
perpendicularity api --config config/agent_config.prod.yaml --workers 4

HTTPS/TLS

Production must use HTTPS!

Option 1: Nginx Reverse Proxy

# /etc/nginx/sites-available/perpendicularity

server {
    listen 80;
    server_name perpendicularity.example.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name perpendicularity.example.com;

    ssl_certificate /etc/letsencrypt/live/perpendicularity.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/perpendicularity.example.com/privkey.pem;

    location / {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Get SSL certificate:

# Using Let's Encrypt
sudo certbot --nginx -d perpendicularity.example.com

📊 Monitoring

Health Checks

Built-in endpoint:

# Check API health
curl http://localhost:8000/api/health

# Response:
{
  "status": "healthy",
  "service": "perpendicularity-api"
}

Logging

Configure logging in production:

# config/agent_config.yaml

logging:
  level: "INFO"
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  file: "/var/log/perpendicularity/api.log"

📈 Scaling

Horizontal Scaling

Multiple workers:

# Run with multiple workers
perpendicularity api --workers 4

# Or in Docker
docker run -d \
  -p 8000:8000 \
  perpendicularity:0.1.0 \
  perpendicularity api --workers 4

🚀 Quick Deploy Commands

Development (Local)

# Clone and run locally
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity
uv sync --extra api
export GOOGLE_API_KEY="your-key"
perpendicularity api --reload

Production (Docker)

# Build and run in Docker
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .
docker run -d \
  --name perpendicularity \
  --restart unless-stopped \
  -p 8000:8000 \
  -e GOOGLE_API_KEY="${GOOGLE_API_KEY}" \
  perpendicularity:0.1.0

Production (EC2 + Ollama)

# On EC2 instance
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b-instruct

git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .
docker run -d --network host perpendicularity:0.1.0

See EC2 Setup Guide for complete instructions.


📚 See Also


Deploy Perpendicularity with confidence! 🚀

For questions, see Troubleshooting or open an issue.