Skip to content

Latest commit

 

History

History
807 lines (540 loc) · 15.3 KB

File metadata and controls

807 lines (540 loc) · 15.3 KB

Troubleshooting Guide

Common issues and solutions for Perpendicularity.


📋 Table of Contents


🔧 Installation Issues

"uv: command not found"

Problem: uv package manager not installed.

Solution:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Verify installation
uv --version

# Restart shell or source profile
source ~/.bashrc  # or ~/.zshrc

Dependencies Installation Fails

Problem: Error installing dependencies.

Common causes & solutions:

# 1. Missing build tools (Linux)
sudo apt install -y build-essential python3-dev

# 2. Missing build tools (macOS)
xcode-select --install

# 3. Upgrade pip
pip install --upgrade pip setuptools wheel

# 4. Install with extras explicitly
pip install -e ".[api,local-models]"

# 5. Use uv (handles dependencies better)
uv sync --extra api --extra local-models

⚙️ Configuration Issues

"Config file not found"

Problem: FileNotFoundError: config/agent_config.yaml not found

Solutions:

# Specify config explicitly
perpendicularity ask "test" --config /full/path/to/config.yaml

"Invalid YAML syntax"

Problem: YAML parsing error.

Solutions:

# 1. Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('config/agent_config.yaml'))"

# 2. Common issues:
# - Tabs instead of spaces (use spaces only!)
# - Missing colons
# - Incorrect indentation
# - Unquoted special characters

# 3. Use YAML validator
yamllint config/agent_config.yaml

# 4. Check for hidden characters
cat -A config/agent_config.yaml | grep -v "^$"

"Environment variable not set"

Problem: GOOGLE_API_KEY not found in environment

Solutions:

# 1. Set environment variable
export GOOGLE_API_KEY="your-key-here"
export ANTHROPIC_API_KEY="your-key-here"

# 2. Make permanent (add to ~/.bashrc or ~/.zshrc)
echo 'export GOOGLE_API_KEY="your-key"' >> ~/.bashrc
source ~/.bashrc

# 4. Verify it's set
echo $GOOGLE_API_KEY

🤖 Model Issues

"Model not found"

Problem: ValueError: Model 'xyz' not found in configuration

Solutions:

# 1. List available models
perpendicularity list-models

# 2. Check spelling (case-sensitive!)
perpendicularity ask "test" --model gemini      # ✅ correct
perpendicularity ask "test" --model Gemini      # ❌ wrong
perpendicularity ask "test" --model gemini-2.5  # ❌ wrong (use config name)

# 3. Add model to config
# Edit config/agent_config.yaml and add model definition

# 4. Use correct model name from config
# Not the API model name, but the config key

API Key Errors (Gemini/Claude)

Problem: Invalid API key or Authentication failed

Solutions:

# 1. Verify API key is set
echo $GOOGLE_API_KEY
echo $ANTHROPIC_API_KEY

# 2. Check key validity
# Gemini: https://aistudio.google.com/app/apikey
# Claude: https://console.anthropic.com/

# 3. Check key format
# Gemini keys start with: AIza...
# Claude keys start with: sk-ant-...

# 4. Regenerate key if invalid

# 5. Test with curl
curl -H "x-goog-api-key: $GOOGLE_API_KEY" \
  https://generativelanguage.googleapis.com/v1/models

Ollama Connection Refused

Problem: Connection refused: http://localhost:11434

Solutions:

# 1. Check if Ollama is running
curl http://localhost:11434/api/tags

# 2. If not running, start it
# Check service status
sudo systemctl status ollama

# Start service
sudo systemctl start ollama

# Or run manually
ollama serve &

# 3. Verify port is listening
lsof -i :11434
netstat -tuln | grep 11434

# 4. Check firewall
sudo ufw status
sudo ufw allow 11434/tcp

# 5. Test with simple request
ollama list
ollama run qwen2.5:14b-instruct "Hello"

Ollama Model Not Found

Problem: Model 'qwen2.5:14b-instruct' not found

Solutions:

# 1. List available models
ollama list

# 2. Pull the model
ollama pull qwen2.5:14b-instruct

# 3. Verify it's pulled
ollama list | grep qwen

# 4. Check model name spelling
# Use exact name from ollama list

# 5. Test model works
ollama run qwen2.5:14b-instruct "test"

HuggingFace CUDA Out of Memory

Problem: RuntimeError: CUDA out of memory

Solutions:

# 1. Use more aggressive quantization
# Edit config/agent_config.yaml:
hf_model:
  load_in_4bit: true  # Instead of load_in_8bit

# 2. Use smaller model
perpendicularity ask "test" --model hf_qwen7b  # Instead of 14b

# 3. Check GPU memory
nvidia-smi

# 4. Use CPU (slow but works)
# Edit config:
hf_model:
  device: "cpu"

🧠 Agent Issues

Agent Hangs / Never Completes

Problem: Agent runs forever without finishing.

Solutions:

# 1. Reduce recursion limit (LangGraph)
perpendicularity ask "test" --max-steps 10

# Or in config:
agent:
  recursion_limit: 10

# 2. Check tool execution
# Enable debug logging
perpendicularity ask "test" --debug

# 3. Switch to ReAct (predictable steps)
perpendicularity ask "test" --agent-type react --max-steps 5

# 4. Check if stuck on specific tool
# Review debug output for repeated tool calls

# 5. Simplify query
# Complex queries may need more steps or better prompts

Agent Makes Poor Decisions

Problem: Agent chooses wrong tools or gives bad answers.

Solutions:

# 1. Use better model
perpendicularity ask "query" --model claude  # Best reasoning
perpendicularity ask "query" --model gemini  # Good balance

# 2. Use appropriate prompt
perpendicularity ask "safety query" --prompt conservative
perpendicularity ask "research query" --prompt exploratory
perpendicularity ask "genomics query" --prompt genomics

# 3. Increase reasoning steps
perpendicularity ask "complex query" --max-steps 10

# 4. Use LangGraph (better reasoning)
perpendicularity ask "query" --agent-type langgraph

# 5. Simplify query or break into parts

ReAct Runs Out of Steps

Problem: Maximum steps (5) reached without completion

Solutions:

# 1. Increase max steps
perpendicularity ask "query" --max-steps 10

# 2. Switch to LangGraph (no step limit)
perpendicularity ask "query" --agent-type langgraph

# 3. Simplify query
# Break complex query into smaller sub-queries

# 4. Check if agent is stuck in loop
# Enable verbose mode to see reasoning
perpendicularity ask "query" --agent-type react --max-steps 10

🔌 MCP Server Issues

MCP Connection Failed

Problem: Failed to connect to MCP server

Solutions:

# 1. Verify server is running
curl http://your-mcp-server:8000/mcp

# 2. Check URL in config
# Edit config/agent_config.yaml
mcp_servers:
  genomic_ops:
    url: "http://correct-server:8000/mcp"  # CHECK HERE

# 3. Check network connectivity
ping your-mcp-server
telnet your-mcp-server 8000

# 4. Check firewall
# On server:
sudo ufw allow 8000/tcp

# 5. Test with verbose logging
perpendicularity ask "test" --debug

MCP Server Timeout

Problem: Tool execution timeout after 180 seconds

Solutions:

# 1. Increase timeout in config
# Edit config/agent_config.yaml:
agent:
  tool_timeout_seconds: 300  # 5 minutes

# 2. Check server logs
# On MCP server, check why it's slow

# 3. Use faster server/model
# Some MCP operations may be slow

# 4. Optimize query
# Some genomic operations on large regions may timeout

No Tools Available

Problem: Agent has no tools / tools list is empty

Solutions:

# 1. Check MCP servers configured
# Edit config/agent_config.yaml
mcp_servers:
  genomic_ops:
    url: "http://server:8000/mcp"

# 2. Test MCP connection manually
curl http://your-server:8000/mcp

# 3. Check server returns tools
# Server should respond with tool list

# 4. Enable debug logging
perpendicularity ask "test" --debug | grep -i tool

# 5. Verify transport type
mcp_servers:
  server:
    transport: "streamable-http"  # Correct for most servers

🌐 API & Frontend Issues

API Server Won't Start

Problem: Error starting server or port already in use

Solutions:

# 1. Check if port 8000 in use
lsof -i :8000
netstat -tuln | grep 8000

# 2. Kill existing process
kill -9 $(lsof -t -i:8000)

# 3. Use different port
perpendicularity api --port 3000

# 4. Check logs for errors
perpendicularity api --log-level debug

# 5. Verify dependencies installed
pip show fastapi uvicorn

CORS Errors in Frontend

Problem: CORS policy: No 'Access-Control-Allow-Origin' header

Solutions:

# 1. Check CORS configuration in api/main.py
# Should include your frontend URL:
allow_origins=[
    "http://localhost:5173",  # Vite default
    "http://localhost:3000",  # Alternative
]

# 2. Add your origin to list (if different)
# Edit api/main.py and rebuild

# 3. For production, use proper domain
allow_origins=[
    "https://your-domain.com",
]

# 4. Restart API server after changes

Frontend "Failed to Load"

Problem: Frontend loads but can't connect to API

Solutions:

# 1. Check API is running
curl http://localhost:8000/api/health

# 2. Check frontend API URL
# Edit frontend/.env.local (if running locally):
VITE_API_URL=http://localhost:8000

# Or for remote API:
VITE_API_URL=http://ec2-xx-xx-xx-xx.compute.amazonaws.com:8000

# 3. Restart frontend dev server
cd frontend
npm run dev

# 4. Check browser console for errors
# Open DevTools (F12) → Console tab

# 5. Verify network requests
# DevTools → Network tab → Look for failed requests

🐳 Deployment Issues

Docker Build Fails

Problem: Error during docker build

Solutions:

# 1. Check Dockerfile exists
ls Dockerfile

# 2. Increase Docker memory
# Docker Desktop → Settings → Resources → Memory → 8GB+

# 3. Build with verbose output
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 . --progress=plain

# 4. Clear Docker cache
docker builder prune -a

# 5. Build stages separately
docker buildx build --platform linux/amd64 --target frontend-builder -t perp-frontend .
docker buildx build --platform linux/amd64 -t perpendicularity:0.1.0 .

Docker Container Won't Start

Problem: Container exits immediately

Solutions:

# 1. Check container logs
docker logs perpendicularity

# 2. Run interactively for debugging
docker run -it perpendicularity:0.1.0 /bin/bash

# 3. Check config file mounted correctly
docker exec perpendicularity ls -la /app/config/

# 4. Verify environment variables
docker exec perpendicularity env | grep API_KEY

# 5. Check for port conflicts
docker ps -a | grep 8000

EC2 Deployment: Can't Connect

Problem: Can't access EC2 instance on port 8000

Solutions:

# 1. Check security group rules
# AWS Console → EC2 → Security Groups
# Ensure port 8000 is open to 0.0.0.0/0 (or your IP)

# 2. Check service is running on EC2
# SSH to instance:
curl http://localhost:8000/api/health

# 3. Check EC2 public IP
aws ec2 describe-instances --instance-ids i-xxxxx

# 4. Test from local machine
curl http://EC2_PUBLIC_IP:8000/api/health

# 5. Check firewall on EC2
sudo ufw status
sudo ufw allow 8000/tcp

Ollama on EC2: Connection Issues

Problem: Docker container can't reach Ollama on EC2

Solutions:

# 1. Verify Ollama running
sudo systemctl status ollama
curl http://localhost:11434/api/tags

# 2. Check Docker network mode
# Must use --network host
docker run --network host ...

# 3. Verify from inside container
docker exec perpendicularity curl http://localhost:11434/api/tags

# 4. Check firewall
sudo ufw status
# Port 11434 should be allowed from localhost (not internet!)

# 5. Restart both services
sudo systemctl restart ollama
docker restart perpendicularity

⚡ Performance Issues

Slow Response Times

Problem: Queries take very long to complete

Solutions:

# 1. Use faster model
perpendicularity ask "query" --model gemini  # Faster
perpendicularity ask "query" --model ollama_qwen7b  # Faster local

# 2. Reduce reasoning steps
perpendicularity ask "query" --max-steps 3

# 3. Check network latency to MCP servers
ping your-mcp-server

# 4. Use local models for development
# Ollama is faster than cloud for repeated queries

# 5. Enable parallel tool execution (future)
# Currently tools run sequentially

High Memory Usage

Problem: Process uses too much RAM

Solutions:

# 1. Use smaller local model
ollama run qwen2.5:7b-instruct  # Instead of 14b or 32b

# 2. Use 4-bit quantization
# In config:
hf_model:
  load_in_4bit: true

# 3. Monitor memory
htop
watch -n 1 'nvidia-smi'  # For GPU

# 4. Restart service periodically
# Memory leaks may accumulate over time

# 5. Use cloud models (no local memory needed)
perpendicularity ask "query" --model gemini

Rate Limits

Problem: Rate limit exceeded errors

Solutions:

# 1. Use different model
# Switch between Gemini and Claude when hitting limits

# 2. Implement backoff
# Wait before retrying

# 3. Use local models (no rate limits)
perpendicularity ask "query" --model ollama_qwen14b

# 4. Upgrade API plan
# Gemini: Increase quota at Google AI Studio
# Claude: Upgrade at Anthropic Console

# 5. Batch queries instead of rapid-fire

🔍 Debugging Tips

Enable Debug Logging

# CLI
perpendicularity ask "test" --debug

# API
perpendicularity api --log-level debug

# Check logs
tail -f /var/log/perpendicularity.log  # If configured

Dry Run Mode

# Test configuration without executing
perpendicularity ask "test" --dry-run

# Shows:
# - Loaded config
# - Selected model
# - Available tools
# - But doesn't execute query

Verbose Output

# See step-by-step reasoning
perpendicularity ask "query" --agent-type react --max-steps 5

# LangGraph shows recursions
# ReAct shows numbered steps

Check Component Status

# List models
perpendicularity list-models

# List prompts
perpendicularity list-prompts

# Test API health
curl http://localhost:8000/api/health

# Test config
python -c "from agent.config import AgentConfig; c = AgentConfig('config/agent_config.yaml'); print('OK')"

📞 Getting Help

Before Opening an Issue

  1. Check this troubleshooting guide
  2. Search existing issues on GitHub
  3. Enable debug logging and collect output
  4. Test with minimal example to isolate problem
  5. Check if configuration is valid

Opening an Issue

Include:

  • Perpendicularity version: perpendicularity --version
  • Python version: python --version
  • Operating system: uname -a or Windows version
  • Configuration (sanitized - no API keys!)
  • Full error message with stack trace
  • Steps to reproduce
  • Debug logs (if applicable)

GitHub Issues: https://github.com/t-neumann/perpendicularity/issues


📚 See Also


Still stuck? Open an issue with debug logs! 🆘