Skip to content

Latest commit

 

History

History
151 lines (112 loc) · 3.92 KB

File metadata and controls

151 lines (112 loc) · 3.92 KB

Linux Setup Guide — GPU Machine

Switching from Windows/WSL to native Linux gives 10-20x faster inference (60-80 tok/s vs 3-5 tok/s).

What you need

  • USB stick (8GB+)
  • Monitor + keyboard (for install only — headless after)
  • Internet connection

Step 1: Create bootable USB

On any Windows machine:

  1. Download Ubuntu Server 24.04 LTS from https://ubuntu.com/download/server
  2. Download Rufus from https://rufus.ie
  3. Flash the ISO to USB with Rufus (default settings are fine)

Step 2: Install Ubuntu

  1. Plug USB into GPU machine, boot from it (F12 or Del for boot menu)
  2. Choose Ubuntu Server (minimized)
  3. Use entire disk (it will wipe Windows — make sure you've backed up)
  4. Enable OpenSSH server when prompted
  5. Set username/password
  6. Note the IP address shown after install, or set a static one

Step 3: Post-install setup (SSH in from another machine)

ssh youruser@machine-ip

Install NVIDIA drivers

sudo apt update && sudo apt install -y nvidia-driver-570
sudo reboot

Verify GPU (after reboot)

ssh youruser@machine-ip
nvidia-smi  # Should show RTX 5090

Install Docker + NVIDIA Container Toolkit

# Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker

# NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU in Docker
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

Install Python + project

sudo apt install -y python3.12 python3-pip python3.12-venv git
git clone <your-repo-url> ~/vibercoded
cd ~/vibercoded
pip install -e ".[dev]"

Install Tailscale (remote access)

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Follow the link to authenticate. Your team accesses the machine via Tailscale IP.

Start everything

cd ~/vibercoded

# Start vLLM
cd gpu-machine && docker compose up -d && cd ..

# Wait for model to load (~2-3 min)
until curl -s http://localhost:8000/health > /dev/null 2>&1; do sleep 5; done
echo "vLLM ready"

# Start AgentCreator
nohup python3 -c "from agent_creator.main import main; main()" > data/server.log 2>&1 &
echo "AgentCreator running on port 7000"

What changes vs Windows

Windows/WSL Native Linux
Translation request ~2,300ms ~300ms
Code review ~5,000ms ~800ms
Tokens/sec 3-5 60-80
GPU clock locking Required (resets on reboot) Not needed (full speed automatic)
pin_memory Disabled (WSL limitation) Enabled (full speed)
Docker GPU access Via WSL2 layer Direct

What stays the same

  • Same docker-compose.yml — no changes needed
  • Same agent containers, same ports
  • Same AgentCreator UI on port 7000
  • Same Tailscale access for team
  • Same project files, same specs, same everything

Ongoing management (all via SSH)

# Check vLLM
docker logs gpu-machine-vllm-1 --tail 10

# Restart vLLM
cd ~/vibercoded/gpu-machine && docker compose restart

# Check agents
curl http://localhost:7000/api/agents

# View server logs
tail -f ~/vibercoded/data/server.log

# Run tests
cd ~/vibercoded && python3 tests/integration/test_runner.py

Time estimate

Step Time
Flash USB + install Ubuntu 15 min
NVIDIA drivers + reboot 5 min
Docker + toolkit 5 min
Python + project 5 min
Pull model + start vLLM 10 min
Total ~40 min