Switching from Windows/WSL to native Linux gives 10-20x faster inference (60-80 tok/s vs 3-5 tok/s).
- USB stick (8GB+)
- Monitor + keyboard (for install only — headless after)
- Internet connection
On any Windows machine:
- Download Ubuntu Server 24.04 LTS from https://ubuntu.com/download/server
- Download Rufus from https://rufus.ie
- Flash the ISO to USB with Rufus (default settings are fine)
- Plug USB into GPU machine, boot from it (F12 or Del for boot menu)
- Choose Ubuntu Server (minimized)
- Use entire disk (it will wipe Windows — make sure you've backed up)
- Enable OpenSSH server when prompted
- Set username/password
- Note the IP address shown after install, or set a static one
ssh youruser@machine-ipsudo apt update && sudo apt install -y nvidia-driver-570
sudo rebootssh youruser@machine-ip
nvidia-smi # Should show RTX 5090# Docker
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker
# NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU in Docker
docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smisudo apt install -y python3.12 python3-pip python3.12-venv git
git clone <your-repo-url> ~/vibercoded
cd ~/vibercoded
pip install -e ".[dev]"curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale upFollow the link to authenticate. Your team accesses the machine via Tailscale IP.
cd ~/vibercoded
# Start vLLM
cd gpu-machine && docker compose up -d && cd ..
# Wait for model to load (~2-3 min)
until curl -s http://localhost:8000/health > /dev/null 2>&1; do sleep 5; done
echo "vLLM ready"
# Start AgentCreator
nohup python3 -c "from agent_creator.main import main; main()" > data/server.log 2>&1 &
echo "AgentCreator running on port 7000"| Windows/WSL | Native Linux | |
|---|---|---|
| Translation request | ~2,300ms | ~300ms |
| Code review | ~5,000ms | ~800ms |
| Tokens/sec | 3-5 | 60-80 |
| GPU clock locking | Required (resets on reboot) | Not needed (full speed automatic) |
| pin_memory | Disabled (WSL limitation) | Enabled (full speed) |
| Docker GPU access | Via WSL2 layer | Direct |
- Same
docker-compose.yml— no changes needed - Same agent containers, same ports
- Same AgentCreator UI on port 7000
- Same Tailscale access for team
- Same project files, same specs, same everything
# Check vLLM
docker logs gpu-machine-vllm-1 --tail 10
# Restart vLLM
cd ~/vibercoded/gpu-machine && docker compose restart
# Check agents
curl http://localhost:7000/api/agents
# View server logs
tail -f ~/vibercoded/data/server.log
# Run tests
cd ~/vibercoded && python3 tests/integration/test_runner.py| Step | Time |
|---|---|
| Flash USB + install Ubuntu | 15 min |
| NVIDIA drivers + reboot | 5 min |
| Docker + toolkit | 5 min |
| Python + project | 5 min |
| Pull model + start vLLM | 10 min |
| Total | ~40 min |