Complete guide for deploying Perpendicularity on AWS EC2 with Ollama for cost-effective local model inference.
This guide shows you how to:
- Launch an EC2 instance with GPU
- Install Ollama and pull models
- Deploy Perpendicularity in Docker
- Configure security and networking
- Monitor and optimize costs
Result: Self-hosted Perpendicularity with local models - $0 per query after infrastructure costs.
| Instance Type | vCPU | RAM | GPU | VRAM | Cost/Hour | Cost/Month* |
|---|---|---|---|---|---|---|
| g5.xlarge | 4 | 16GB | 1x A10G | 24GB | $1.006 | $730 |
| g5.2xlarge | 8 | 32GB | 1x A10G | 24GB | $1.212 | $880 |
| g5.4xlarge | 16 | 64GB | 1x A10G | 24GB | $1.624 | $1,180 |
*24/7 operation. Use Spot instances for 60-70% savings.
g5.xlarge (24GB VRAM):
- ✅ 7B models (4-bit): ~4GB
- ✅ 14B models (8-bit): ~8GB
- ✅ 14B models (4-bit): ~7GB
- ✅ 32B models (4-bit): ~18GB
- ❌ 70B models: Need g5.12xlarge (4x GPUs, 96GB VRAM)
Recommendation: g5.xlarge ($1/hr) is perfect for most use cases.
Assuming 1,000 complex queries/day:
| Model | Daily Cost | Monthly Cost | Break-Even |
|---|---|---|---|
| g5.xlarge (24/7) | $24 | $730 | Baseline |
| Gemini Flash | $300 | $9,000 | 2.5 days |
| Claude Sonnet | $1,200 | $36,000 | 0.6 days |
Conclusion: EC2 with Ollama pays for itself in < 3 days for high-volume workloads.
# 1. Launch EC2 instance (us-east-1)
# Instance type: g5.xlarge
# AMI: Ubuntu 22.04 LTS
# Storage: 100GB gp3
# Security group: Allow 22 (SSH), 8000 (API), 11434 (Ollama)
# 2. SSH into instance
ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com
# 3. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 4. Pull models
ollama pull qwen2.5:14b-instruct
ollama pull deepseek-r1:8b
# 5. Install Docker
sudo apt update
sudo apt install -y docker.io
sudo systemctl start docker
sudo usermod -aG docker ubuntu
# Log out and back in for group changes
# 6. Clone Perpendicularity
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity
# 7. Configure MCP servers (edit URLs)
nano config/agent_config.yaml
# 8. Build and run
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .
docker run -d --name perpendicularity --network host \
-v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
perpendicularity:0.1.0 \
perpendicularity api --config /app/config/agent_config.yaml
# 9. Access
# Frontend: http://EC2-PUBLIC-IP:8000
# API Docs: http://EC2-PUBLIC-IP:8000/docs-
Go to EC2 Dashboard → Launch Instance
-
Name and Tags:
- Name:
perpendicularity-gpu
- Name:
-
Application and OS Images:
- AMI: Deep Learning Base AMI with Single CUDA (Ubuntu 22.04) (Free tier eligible)
- Architecture: 64-bit (x86)
-
Instance Type:
- Family: Accelerated Computing
- Type: g5.xlarge (4 vCPU, 16GB RAM, 24GB GPU)
- Filter: Show "GPU instances"
-
Key Pair:
- Create new key pair or use existing
- Download
.pemfile if new - Set permissions:
chmod 400 your-key.pem
-
Network Settings:
- VPC: Default (or custom)
- Subnet: No preference (or specific)
- Auto-assign public IP: Enable
- Firewall (Security Group): Create new
- Allow SSH (22) from My IP
- Allow Custom TCP (8000) from Anywhere (0.0.0.0/0)
- Allow Custom TCP (11434) from My IP (for Ollama admin)
-
Configure Storage:
- Size: 100 GB (models need space)
- Type: gp3 (better performance than gp2)
- IOPS: 3000 (default)
- Throughput: 125 MB/s (default)
-
Advanced Details:
- Spot instance: Consider for 60-70% savings (may be interrupted)
- IAM instance profile: None needed (unless using AWS services)
-
Launch Instance
# Create security group
aws ec2 create-security-group \
--group-name perpendicularity-sg \
--description "Perpendicularity API and Ollama" \
--vpc-id vpc-xxxxx
# Add inbound rules
MY_IP=$(curl -s https://checkip.amazonaws.com)
SG_ID=sg-xxxxx
aws ec2 authorize-security-group-ingress \
--group-id $SG_ID \
--protocol tcp --port 22 --cidr $MY_IP/32
aws ec2 authorize-security-group-ingress \
--group-id $SG_ID \
--protocol tcp --port 8000 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress \
--group-id $SG_ID \
--protocol tcp --port 11434 --cidr $MY_IP/32
# Launch instance
aws ec2 run-instances \
--image-id ami-0c7217cdde317cfec \ # Ubuntu 22.04 LTS (us-east-1)
--instance-type g5.xlarge \
--key-name your-key-name \
--security-group-ids $SG_ID \
--block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=perpendicularity-gpu}]'# Get public IP
aws ec2 describe-instances \
--filters "Name=tag:Name,Values=perpendicularity-gpu" \
--query 'Reservations[0].Instances[0].PublicIpAddress' \
--output text
# SSH (replace with your IP and key)
ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com# Update system
sudo apt update && sudo apt upgrade -y
# Install essentials
sudo apt install -y \
git \
curl \
wget \
htop \
nvtop \ # GPU monitoring
build-essential
# Verify GPU
nvidia-smi
# Should show:
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 525.xx.xx Driver Version: 525.xx.xx CUDA Version: 12.0 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 |
# +-------------------------------+----------------------+----------------------+# Install Ollama (installs as systemd service)
curl -fsSL https://ollama.com/install.sh | sh
# Verify service is running
sudo systemctl status ollama
# Should show:
# ● ollama.service - Ollama Service
# Loaded: loaded (/etc/systemd/system/ollama.service; enabled)
# Active: active (running) since ...
# Verify API is accessible
curl http://localhost:11434/api/tags
# Should return:
# {"models":[]}Ollama is now running as a systemd service:
- Starts automatically on boot
- Runs on port 11434
- No need to manually run
ollama serve
# Recommended: Qwen 2.5 14B (best balance)
ollama pull qwen2.5:14b-instruct
# Also recommended: DeepSeek R1 8B (reasoning specialist)
ollama pull deepseek-r1:8b
# Optional: Qwen 2.5 32B (highest quality, slower)
ollama pull qwen2.5:32b-instruct
# Optional: Qwen 2.5 7B (fastest)
ollama pull qwen2.5:7b-instruct
# Verify models
ollama list
# Should show:
# NAME ID SIZE MODIFIED
# qwen2.5:14b-instruct abc123def 8.0 GB 2 minutes ago
# deepseek-r1:8b def456ghi 4.7 GB 1 minute agoVRAM Usage:
- qwen2.5:7b → ~4GB
- deepseek-r1:8b → ~5GB
- qwen2.5:14b → ~8GB
- qwen2.5:32b → ~18GB
Tip: Monitor VRAM usage with nvtop or nvidia-smi
# Install Docker
sudo apt update
sudo apt install -y docker.io
# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker
# Add user to docker group (avoid sudo)
sudo usermod -aG docker ubuntu
# Log out and back in for group changes to take effect
exit
ssh -i your-key.pem ubuntu@ec2-xx-xx-xx-xx.compute.amazonaws.com
# Verify Docker works without sudo
docker --version
docker ps
# Should work without "permission denied"# Clone repository
git clone https://github.com/t-neumann/perpendicularity.git
cd perpendicularity
# Configure agent_config.yaml
nano config/agent_config.yamlEdit configuration:
# Set default model to local Ollama
default_model: "ollama_qwen14b"
# Verify Ollama models are configured
models:
defaults:
openai:
base_url: "http://localhost:11434/v1" # Ollama endpoint
ollama_qwen14b:
type: "openai"
name: "qwen2.5:14b-instruct"
ollama_deepseek:
type: "openai"
name: "deepseek-r1:8b"
# Update MCP server URLs to your actual instances
mcp_servers:
genomic_ops:
url: "http://YOUR-GENOMIC-SERVER-IP:8000/mcp"
txgemma:
url: "http://YOUR-TXGEMMA-SERVER-IP:8000/mcp"Build Docker image:
docker build -t perpendicularity:0.1.0 .
# Build takes 3-5 minutes
# Watch for errors (should complete successfully)Run container:
docker run -d \
--name perpendicularity \
--restart unless-stopped \
--network host \
-v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
perpendicularity:0.1.0 \
perpendicularity api --config /app/config/agent_config.yamlWhy --network host?
- Allows container to access Ollama at
localhost:11434 - Container binds directly to host's port 8000
- Simpler than bridge networking for this use case
Verify it's running:
# Check container status
docker ps
# Should show:
# CONTAINER ID IMAGE STATUS
# abc123... perpendicularity:0.1.0 Up 30 seconds
# Check logs
docker logs perpendicularity
# Should show startup banner and "Listening on http://0.0.0.0:8000"
# Test API
curl http://localhost:8000/api/health
# Should return:
# {"status":"healthy","service":"perpendicularity-api"}# On your local machine (not EC2)
# Get EC2 public IP
EC2_IP="xx.xx.xx.xx" # From AWS console
# Test API
curl http://$EC2_IP:8000/api/health
# Test frontend
open http://$EC2_IP:8000
# Test query
curl -X POST http://$EC2_IP:8000/api/chat \
-H "Content-Type: application/json" \
-d '{
"question": "What is aspirin?",
"agent_type": "langgraph",
"model": "ollama_qwen14b",
"stream": false
}'# Real-time GPU monitoring
nvtop
# Or nvidia-smi
watch -n 1 nvidia-smi
# Key metrics:
# - GPU utilization (should be high during inference)
# - Memory usage (should not exceed 24GB)
# - Temperature (should stay < 85°C)# Container resource usage
docker stats perpendicularity
# Logs
docker logs -f perpendicularity
# Last 100 lines
docker logs --tail 100 perpendicularitySave 60-70% with Spot instances:
# Launch spot instance
aws ec2 run-instances \
--instance-type g5.xlarge \
--instance-market-options 'MarketType=spot,SpotOptions={MaxPrice=0.50}' \
...Pros:
- 60-70% cheaper
- Good for development/testing
Cons:
- Can be interrupted (2-minute warning)
- Less reliable for production
# Stop instance (preserves data, no compute charges)
aws ec2 stop-instances --instance-ids i-xxxxx
# Start when needed
aws ec2 start-instances --instance-ids i-xxxxx
# Costs when stopped:
# - EBS storage: $0.08/GB/month = $8/month for 100GB
# - No compute charges# Use 7B model instead of 14B
default_model: "ollama_qwen7b"
# Or use cloud models for low-volume
default_model: "gemini" # $0.10/query vs $24/day EC2For long-term use (1-3 years):
# 1-year reserved instance
aws ec2 purchase-reserved-instances-offering \
--instance-type g5.xlarge \
--instance-count 1 \
--reserved-instances-offering-id xxxxx
# Savings: ~40% vs on-demand# SSH to EC2
cd perpendicularity
# Pull latest code
git pull
# Rebuild Docker image
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .
# Stop old container
docker stop perpendicularity
docker rm perpendicularity
# Start new container
docker run -d --name perpendicularity --network host \
-v $(pwd)/config/agent_config.yaml:/app/config/agent_config.yaml:ro \
perpendicularity:0.1.0 \
perpendicularity api --config /app/config/agent_config.yaml# Update model
ollama pull qwen2.5:14b-instruct
# Restart container (to pick up new model)
docker restart perpendicularity# Backup config to S3
aws s3 cp config/agent_config.yaml s3://my-bucket/perpendicularity/
# Restore
aws s3 cp s3://my-bucket/perpendicularity/agent_config.yaml config/# Check Ollama is running
sudo systemctl status ollama
curl http://localhost:11434/api/tags
# Verify container can access host
docker exec perpendicularity curl http://localhost:11434/api/tags
# If fails, check network mode
docker inspect perpendicularity | grep NetworkMode
# Should show: "NetworkMode": "host"# Check NVIDIA driver
nvidia-smi
# If not found, install driver
sudo apt install -y nvidia-driver-525
sudo reboot# Check VRAM usage
nvidia-smi
# Use smaller model
ollama pull qwen2.5:7b-instruct
# Or use 4-bit quantization for larger models
# (Ollama does this automatically)# Check logs
docker logs perpendicularity
# Common issues:
# 1. Config file not found
docker exec perpendicularity cat /app/config/agent_config.yaml
# 2. Port already in use
sudo lsof -i :8000
# 3. Ollama not running
curl http://localhost:11434/api/tags- Deployment Guide - General deployment
- Models Guide - Model selection
- Configuration Reference - Config options
- AWS EC2 Pricing - Current rates
- Ollama Models - Available models
Deploy Perpendicularity cost-effectively on EC2! 🚀
For questions, see Troubleshooting or open an issue.