Skip to content

Latest commit

 

History

History
650 lines (469 loc) · 14.3 KB

File metadata and controls

650 lines (469 loc) · 14.3 KB

EC2 Setup Guide

Complete guide for deploying Perpendicularity on AWS EC2 with Ollama for cost-effective local model inference.


🎯 Overview

This guide shows you how to:

  1. Launch an EC2 instance with GPU
  2. Install Ollama and pull models
  3. Deploy Perpendicularity in Docker
  4. Configure security and networking
  5. Monitor and optimize costs

Result: Self-hosted Perpendicularity with local models - $0 per query after infrastructure costs.


💰 Cost Analysis

Infrastructure Costs (us-east-1, as of Feb 2025)

Instance Type vCPU RAM GPU VRAM Cost/Hour Cost/Month*
g5.xlarge 4 16GB 1x A10G 24GB $1.006 $730
g5.2xlarge 8 32GB 1x A10G 24GB $1.212 $880
g5.4xlarge 16 64GB 1x A10G 24GB $1.624 $1,180

*24/7 operation. Use Spot instances for 60-70% savings.

Model Capacity

g5.xlarge (24GB VRAM):

  • ✅ 7B models (4-bit): ~4GB
  • ✅ 14B models (8-bit): ~8GB
  • ✅ 14B models (4-bit): ~7GB
  • ✅ 32B models (4-bit): ~18GB
  • ❌ 70B models: Need g5.12xlarge (4x GPUs, 96GB VRAM)

Recommendation: g5.xlarge ($1/hr) is perfect for most use cases.

Break-Even vs Cloud Models

Assuming 1,000 complex queries/day:

Model Daily Cost Monthly Cost Break-Even
g5.xlarge (24/7) $24 $730 Baseline
Gemini Flash $300 $9,000 2.5 days
Claude Sonnet $1,200 $36,000 0.6 days

Conclusion: EC2 with Ollama pays for itself in < 3 days for high-volume workloads.


🚀 Quick Start (15 minutes)

# 1. Launch EC2 instance (us-east-1)
# Instance type: g5.xlarge
# AMI: Ubuntu 22.04 LTS
# Storage: 100GB gp3
# Security group: Allow 22 (SSH), 8000 (API), 11434 (Ollama)

# 2. SSH into instance
ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com

# 3. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 4. Pull models
ollama pull qwen2.5:14b-instruct
ollama pull deepseek-r1:8b

# 5. Install Docker
sudo apt update
sudo apt install -y docker.io
sudo systemctl start docker
sudo usermod -aG docker ubuntu
# Log out and back in for group changes

# 6. Clone Perpendicularity
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity

# 7. Configure MCP servers (edit URLs)
nano config/agent_config.yaml

# 8. Build and run
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .
docker run -d --name perpendicularity --network host \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  perpendicularity:0.1.0 \
  perpendicularity api --config /app/config/agent_config.yaml

# 9. Access
# Frontend: http://EC2-PUBLIC-IP:8000
# API Docs: http://EC2-PUBLIC-IP:8000/docs

📋 Detailed Setup

Step 1: Launch EC2 Instance

Via AWS Console

  1. Go to EC2 Dashboard → Launch Instance

  2. Name and Tags:

    • Name: perpendicularity-gpu
  3. Application and OS Images:

    • AMI: Deep Learning Base AMI with Single CUDA (Ubuntu 22.04) (Free tier eligible)
    • Architecture: 64-bit (x86)
  4. Instance Type:

    • Family: Accelerated Computing
    • Type: g5.xlarge (4 vCPU, 16GB RAM, 24GB GPU)
    • Filter: Show "GPU instances"
  5. Key Pair:

    • Create new key pair or use existing
    • Download .pem file if new
    • Set permissions: chmod 400 your-key.pem
  6. Network Settings:

    • VPC: Default (or custom)
    • Subnet: No preference (or specific)
    • Auto-assign public IP: Enable
    • Firewall (Security Group): Create new
      • Allow SSH (22) from My IP
      • Allow Custom TCP (8000) from Anywhere (0.0.0.0/0)
      • Allow Custom TCP (11434) from My IP (for Ollama admin)
  7. Configure Storage:

    • Size: 100 GB (models need space)
    • Type: gp3 (better performance than gp2)
    • IOPS: 3000 (default)
    • Throughput: 125 MB/s (default)
  8. Advanced Details:

    • Spot instance: Consider for 60-70% savings (may be interrupted)
    • IAM instance profile: None needed (unless using AWS services)
  9. Launch Instance

Via AWS CLI

# Create security group
aws ec2 create-security-group \
  --group-name perpendicularity-sg \
  --description "Perpendicularity API and Ollama" \
  --vpc-id vpc-xxxxx

# Add inbound rules
MY_IP=$(curl -s https://checkip.amazonaws.com)
SG_ID=sg-xxxxx

aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp --port 22 --cidr $MY_IP/32

aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp --port 8000 --cidr 0.0.0.0/0

aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp --port 11434 --cidr $MY_IP/32

# Launch instance
aws ec2 run-instances \
  --image-id ami-0c7217cdde317cfec \  # Ubuntu 22.04 LTS (us-east-1)
  --instance-type g5.xlarge \
  --key-name your-key-name \
  --security-group-ids $SG_ID \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=perpendicularity-gpu}]'

Step 2: Connect to Instance

# Get public IP
aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=perpendicularity-gpu" \
  --query 'Reservations[0].Instances[0].PublicIpAddress' \
  --output text

# SSH (replace with your IP and key)
ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com

Step 3: Install System Dependencies

# Update system
sudo apt update && sudo apt upgrade -y

# Install essentials
sudo apt install -y \
  git \
  curl \
  wget \
  htop \
  nvtop \  # GPU monitoring
  build-essential

# Verify GPU
nvidia-smi

# Should show:
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 525.xx.xx    Driver Version: 525.xx.xx    CUDA Version: 12.0     |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# |   0  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
# +-------------------------------+----------------------+----------------------+

Step 4: Install Ollama

# Install Ollama (installs as systemd service)
curl -fsSL https://ollama.com/install.sh | sh

# Verify service is running
sudo systemctl status ollama

# Should show:
# ● ollama.service - Ollama Service
#    Loaded: loaded (/etc/systemd/system/ollama.service; enabled)
#    Active: active (running) since ...

# Verify API is accessible
curl http://localhost:11434/api/tags

# Should return:
# {"models":[]}

Ollama is now running as a systemd service:

  • Starts automatically on boot
  • Runs on port 11434
  • No need to manually run ollama serve

Step 5: Pull Ollama Models

# Recommended: Qwen 2.5 14B (best balance)
ollama pull qwen2.5:14b-instruct

# Also recommended: DeepSeek R1 8B (reasoning specialist)
ollama pull deepseek-r1:8b

# Optional: Qwen 2.5 32B (highest quality, slower)
ollama pull qwen2.5:32b-instruct

# Optional: Qwen 2.5 7B (fastest)
ollama pull qwen2.5:7b-instruct

# Verify models
ollama list

# Should show:
# NAME                        ID              SIZE      MODIFIED
# qwen2.5:14b-instruct        abc123def       8.0 GB    2 minutes ago
# deepseek-r1:8b              def456ghi       4.7 GB    1 minute ago

VRAM Usage:

  • qwen2.5:7b → ~4GB
  • deepseek-r1:8b → ~5GB
  • qwen2.5:14b → ~8GB
  • qwen2.5:32b → ~18GB

Tip: Monitor VRAM usage with nvtop or nvidia-smi


Step 6: Install Docker

# Install Docker
sudo apt update
sudo apt install -y docker.io

# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker

# Add user to docker group (avoid sudo)
sudo usermod -aG docker ubuntu

# Log out and back in for group changes to take effect
exit
ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com

# Verify Docker works without sudo
docker --version
docker ps

# Should work without "permission denied"

Step 7: Deploy Perpendicularity

# Clone repository
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity

# Configure agent_config.yaml
nano config/agent_config.yaml

Edit configuration:

# Set default model to local Ollama
default_model: "ollama_qwen14b"

# Verify Ollama models are configured
models:
  defaults:
    openai:
      base_url: "http://localhost:11434/v1"  # Ollama endpoint

  ollama_qwen14b:
    type: "openai"
    name: "qwen2.5:14b-instruct"

  ollama_deepseek:
    type: "openai"
    name: "deepseek-r1:8b"

# Update MCP server URLs to your actual instances
mcp_servers:
  genomic_ops:
    url: "http://YOUR-GENOMIC-SERVER-IP:8000/mcp"
  
  txgemma:
    url: "http://YOUR-TXGEMMA-SERVER-IP:8000/mcp"

Build Docker image:

docker build -t perpendicularity:0.1.0 .

# Build takes 3-5 minutes
# Watch for errors (should complete successfully)

Run container:

docker run -d \
  --name perpendicularity \
  --restart unless-stopped \
  --network host \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  perpendicularity:0.1.0 \
  perpendicularity api --config /app/config/agent_config.yaml

Why --network host?

  • Allows container to access Ollama at localhost:11434
  • Container binds directly to host's port 8000
  • Simpler than bridge networking for this use case

Verify it's running:

# Check container status
docker ps

# Should show:
# CONTAINER ID   IMAGE                    STATUS
# abc123...      perpendicularity:0.1.0   Up 30 seconds

# Check logs
docker logs perpendicularity

# Should show startup banner and "Listening on http://0.0.0.0:8000"

# Test API
curl http://localhost:8000/api/health

# Should return:
# {"status":"healthy","service":"perpendicularity-api"}

Step 8: Test from Outside

# On your local machine (not EC2)

# Get EC2 public IP
EC2_IP="xx.xx.xx.xx"  # From AWS console

# Test API
curl http://$EC2_IP:8000/api/health

# Test frontend
open http://$EC2_IP:8000

# Test query
curl -X POST http://$EC2_IP:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is aspirin?",
    "agent_type": "langgraph",
    "model": "ollama_qwen14b",
    "stream": false
  }'

📊 Monitoring

GPU Monitoring

# Real-time GPU monitoring
nvtop

# Or nvidia-smi
watch -n 1 nvidia-smi

# Key metrics:
# - GPU utilization (should be high during inference)
# - Memory usage (should not exceed 24GB)
# - Temperature (should stay < 85°C)

Container Monitoring

# Container resource usage
docker stats perpendicularity

# Logs
docker logs -f perpendicularity

# Last 100 lines
docker logs --tail 100 perpendicularity

💰 Cost Optimization

1. Use Spot Instances

Save 60-70% with Spot instances:

# Launch spot instance
aws ec2 run-instances \
  --instance-type g5.xlarge \
  --instance-market-options 'MarketType=spot,SpotOptions={MaxPrice=0.50}' \
  ...

Pros:

  • 60-70% cheaper
  • Good for development/testing

Cons:

  • Can be interrupted (2-minute warning)
  • Less reliable for production

2. Stop When Not in Use

# Stop instance (preserves data, no compute charges)
aws ec2 stop-instances --instance-ids i-xxxxx

# Start when needed
aws ec2 start-instances --instance-ids i-xxxxx

# Costs when stopped:
# - EBS storage: $0.08/GB/month = $8/month for 100GB
# - No compute charges

3. Use Smaller Models

# Use 7B model instead of 14B
default_model: "ollama_qwen7b"

# Or use cloud models for low-volume
default_model: "gemini"  # $0.10/query vs $24/day EC2

4. Reserved Instances

For long-term use (1-3 years):

# 1-year reserved instance
aws ec2 purchase-reserved-instances-offering \
  --instance-type g5.xlarge \
  --instance-count 1 \
  --reserved-instances-offering-id xxxxx

# Savings: ~40% vs on-demand

🔄 Maintenance

Update Perpendicularity

# SSH to EC2
cd perpendicularity

# Pull latest code
git pull

# Rebuild Docker image
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .

# Stop old container
docker stop perpendicularity
docker rm perpendicularity

# Start new container
docker run -d --name perpendicularity --network host \
  -v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
  perpendicularity:0.1.0 \
  perpendicularity api --config /app/config/agent_config.yaml

Update Ollama Models

# Update model
ollama pull qwen2.5:14b-instruct

# Restart container (to pick up new model)
docker restart perpendicularity

Backup Configuration

# Backup config to S3
aws s3 cp config/agent_config.yaml s3://my-bucket/perpendicularity/

# Restore
aws s3 cp s3://my-bucket/perpendicularity/agent_config.yaml config/

🐛 Troubleshooting

Ollama Not Accessible from Container

# Check Ollama is running
sudo systemctl status ollama
curl http://localhost:11434/api/tags

# Verify container can access host
docker exec perpendicularity curl http://localhost:11434/api/tags

# If fails, check network mode
docker inspect perpendicularity | grep NetworkMode
# Should show: "NetworkMode": "host"

GPU Not Detected

# Check NVIDIA driver
nvidia-smi

# If not found, install driver
sudo apt install -y nvidia-driver-525
sudo reboot

Out of Memory

# Check VRAM usage
nvidia-smi

# Use smaller model
ollama pull qwen2.5:7b-instruct

# Or use 4-bit quantization for larger models
# (Ollama does this automatically)

Container Won't Start

# Check logs
docker logs perpendicularity

# Common issues:
# 1. Config file not found
docker exec perpendicularity cat /app/config/agent_config.yaml

# 2. Port already in use
sudo lsof -i :8000

# 3. Ollama not running
curl http://localhost:11434/api/tags

📚 Additional Resources


Deploy Perpendicularity cost-effectively on EC2! 🚀

For questions, see Troubleshooting or open an issue.