- Docker Desktop running (open from Start menu, wait for it to fully start)
- Python 3.13 installed (already on PATH)
Run this every time after a reboot — prevents Windows from throttling the GPU:
nvidia-smi -lgc 1500,3090Without this, inference is ~5x slower.
vLLM auto-starts with Docker Desktop (restart: unless-stopped). Verify it's ready:
curl http://localhost:8000/healthIf it's not running:
cd C:\vibercoded\gpu-machine
docker compose up -dFirst boot takes ~2-3 minutes to load the model into GPU memory.
cd C:\vibercoded
python -c "from agent_creator.main import main; main()"Server starts at http://localhost:7000.
- Dashboard: http://localhost:7000
- Create agent: http://localhost:7000/agents/new (describe in English or paste YAML)
- Topology view: http://localhost:7000/topology
- API docs: http://localhost:7000/docs
Through the dashboard, or via API:
# Start
curl -X POST http://localhost:7000/api/agents/{name}/start
# Stop
curl -X POST http://localhost:7000/api/agents/{name}/stop
# Build from spec
curl -X POST http://localhost:7000/api/agents/{name}/build
# Health check
curl http://localhost:7000/api/agents/{name}/health- Stop AgentCreator: Ctrl+C in the terminal
- Stop agent containers: use Docker Desktop or
docker stop $(docker ps -q --filter "name=ferite-agent") - Stop vLLM:
docker stop gpu-machine-vllm-1
| Service | Port |
|---|---|
| AgentCreator UI + API | 7000 |
| vLLM (LLM inference) | 8000 |
| Agent containers | 9000-9200 |