Skip to content

Commit 74aa38f

Browse files
committed
docs: add first version of the docs (WIP)
1 parent 7232aba commit 74aa38f

4 files changed

Lines changed: 622 additions & 0 deletions

File tree

docs/analysis.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Analysis
2+
3+
Generate comprehensive benchmarks, analyze performance metrics, and integrate with BalatroBench for detailed statistics visualization.
4+
5+
## Benchmark Generation
6+
7+
### Basic Benchmarking
8+
9+
Run benchmarks to evaluate model performance:
10+
11+
```bash
12+
# Benchmark current configuration
13+
balatrollm benchmark
14+
15+
# Benchmark specific model across strategies
16+
balatrollm --model openai/gpt-oss-120b benchmark
17+
18+
# Benchmark with multiple runs for statistical significance
19+
balatrollm --runs 20 benchmark
20+
```
21+
22+
### Comprehensive Benchmarking
23+
24+
Generate benchmarks across multiple dimensions:
25+
26+
```bash
27+
# Benchmark all models with default strategy
28+
make balatrobench
29+
30+
# Benchmark specific strategy across models
31+
balatrollm --strategy aggressive --runs 15 benchmark
32+
33+
# Benchmark multiple strategies and models
34+
for strategy in default aggressive; do
35+
for model in openai/gpt-oss-20b openai/gpt-oss-120b qwen/qwen3-235b-a22b-2507; do
36+
balatrollm --strategy $strategy --model $model --runs 10 benchmark
37+
done
38+
done
39+
```
40+
41+
## Benchmark Results
42+
43+
### Result Structure
44+
45+
Benchmarks are organized hierarchically:
46+
47+
```
48+
benchmarks/
49+
├── v0.10.0/ # Version
50+
│ ├── default/ # Strategy
51+
│ │ ├── openrouter/ # Provider
52+
│ │ │ ├── gpt-oss-20b.json
53+
│ │ │ └── gpt-oss-120b.json
54+
│ │ └── leaderboard.json # Strategy summary
55+
│ └── aggressive/
56+
│ ├── openrouter/
57+
│ └── leaderboard.json
58+
```
59+
60+
### Understanding Metrics
61+
62+
Key performance indicators in benchmark results:
63+
64+
- **Win Rate**: Percentage of games won
65+
- **Average Score**: Mean final score across runs
66+
- **Consistency**: Standard deviation of scores
67+
- **Efficiency**: Score per ante progression
68+
- **Strategy Adherence**: How well the bot follows strategy guidelines
69+
70+
## BalatroBench Integration
71+
72+
### Overview
73+
74+
[BalatroBench](https://s1m0n38.github.io/balatrobench/) is a web-based dashboard for visualizing and comparing LLM performance in Balatro. It provides interactive charts, leaderboards, and detailed analytics.
75+
76+
*[Screenshot placeholder: BalatroBench dashboard showing model comparison]*
77+
78+
### Uploading Results
79+
80+
Integrate your local benchmark results with BalatroBench:
81+
82+
```bash
83+
# Generate benchmarks locally
84+
balatrollm --runs 20 benchmark
85+
86+
# Upload to BalatroBench (coming soon)
87+
balatrollm benchmark --upload
88+
89+
# Or manually copy results to BalatroBench format
90+
cp benchmarks/v0.10.0/default/leaderboard.json /path/to/balatrobench/data/
91+
```
92+
93+
### Viewing Results
94+
95+
Access comprehensive analytics through the web interface:
96+
97+
1. **Model Comparison**: Side-by-side performance metrics
98+
2. **Strategy Analysis**: How different strategies perform across models
99+
3. **Trend Analysis**: Performance changes over time
100+
4. **Detailed Breakdowns**: Ante-by-ante progression analysis
101+
102+
*[Screenshot placeholder: Model comparison view in BalatroBench]*
103+
104+
## Local Analysis
105+
106+
### Command-Line Analysis
107+
108+
Analyze results directly from the command line:
109+
110+
```bash
111+
# View latest benchmark summary
112+
cat benchmarks/v0.10.0/default/leaderboard.json | jq
113+
114+
# Compare models
115+
jq '.models[] | {name: .model, win_rate: .metrics.win_rate, avg_score: .metrics.avg_score}' \
116+
benchmarks/v0.10.0/default/leaderboard.json
117+
118+
# Find top performer
119+
jq '.models | sort_by(.metrics.avg_score) | reverse | .[0]' \
120+
benchmarks/v0.10.0/default/leaderboard.json
121+
```
122+
123+
### Custom Analysis Scripts
124+
125+
Create custom analysis for specific insights:
126+
127+
```bash
128+
# Calculate model efficiency (score per run)
129+
find benchmarks -name "*.json" -not -name "leaderboard.json" | \
130+
xargs jq -r '[.model, (.total_score / .total_runs)] | @csv'
131+
132+
# Compare strategies for same model
133+
diff <(jq '.models[] | select(.model=="gpt-oss-20b") | .metrics' \
134+
benchmarks/v0.10.0/default/leaderboard.json) \
135+
<(jq '.models[] | select(.model=="gpt-oss-20b") | .metrics' \
136+
benchmarks/v0.10.0/aggressive/leaderboard.json)
137+
```
138+
139+
## Performance Tracking
140+
141+
### Continuous Monitoring
142+
143+
Set up automated benchmarking:
144+
145+
```bash
146+
# Daily benchmark script
147+
#!/bin/bash
148+
DATE=$(date +%Y%m%d)
149+
balatrollm --runs 5 --runs-dir "daily_benchmarks/$DATE" benchmark
150+
```
151+
152+
### Regression Testing
153+
154+
Monitor performance across versions:
155+
156+
```bash
157+
# Compare current version to previous
158+
jq '.models[] | {model, current: .metrics.avg_score}' \
159+
benchmarks/v0.10.0/default/leaderboard.json > current.json
160+
161+
jq '.models[] | {model, previous: .metrics.avg_score}' \
162+
benchmarks/v0.9.0/default/leaderboard.json > previous.json
163+
164+
# Join and compare
165+
jq -s 'add | group_by(.model) | map(add)' current.json previous.json
166+
```
167+
168+
## Interpreting Results
169+
170+
### Statistical Significance
171+
172+
Ensure reliable results:
173+
174+
```bash
175+
# Run sufficient samples for confidence
176+
balatrollm --runs 30 benchmark # Minimum recommended
177+
178+
# Check variance in results
179+
jq '.detailed_runs[] | .final_score' benchmarks/latest/model.json | \
180+
awk '{sum+=$1; sumsq+=$1*$1} END {print "Mean:", sum/NR, "StdDev:", sqrt((sumsq-sum*sum/NR)/NR)}'
181+
```
182+
183+
### Model Selection Criteria
184+
185+
Choose models based on your priorities:
186+
187+
- **Consistency**: Low standard deviation in scores
188+
- **Peak Performance**: Highest maximum scores achieved
189+
- **Win Rate**: Reliability in completing games successfully
190+
- **Speed**: Faster response times for real-time applications
191+
192+
### Strategy Optimization
193+
194+
Use results to refine strategies:
195+
196+
```bash
197+
# Identify successful patterns
198+
jq '.detailed_runs[] | select(.final_score > 8000) | .strategy_decisions' \
199+
benchmarks/v0.10.0/aggressive/openrouter/gpt-oss-120b.json
200+
201+
# Find failure modes
202+
jq '.detailed_runs[] | select(.final_score < 2000) | .failure_reason' \
203+
benchmarks/v0.10.0/default/openrouter/gpt-oss-20b.json
204+
```

docs/index.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# BalatroLLM
2+
3+
**LLM-powered bot that plays Balatro using strategic decision making**
4+
5+
______________________________________________________________________
6+
7+
!!! warning "Pre-1.0 Development Notice"
8+
9+
This project is currently in pre-1.0 development phase. According to [Semantic Versioning](https://semver.org/) specification, minor version updates (0.x.y → 0.(x+1).0) may introduce breaking changes. Please review release notes carefully before upgrading.
10+
11+
BalatroLLM is an intelligent bot that leverages Large Language Models to play Balatro, the popular roguelike poker deck-building game. The bot uses OpenAI-compatible APIs to communicate with various LLM providers and makes strategic decisions based on comprehensive game state analysis. Whether you're running benchmarks across different models or exploring AI gaming strategies, BalatroLLM provides a robust framework for automated Balatro gameplay.
12+
13+
<div class="grid cards" markdown>
14+
15+
- :material-cog:{ .lg .middle } __Setup__
16+
17+
---
18+
19+
Installation guide covering dependencies, environment setup, and API key configuration.
20+
21+
[:octicons-arrow-right-24: Setup](setup.md)
22+
23+
- :material-play:{ .lg .middle } __Usage__
24+
25+
---
26+
27+
Learn how to run the bot, configure strategies, and customize gameplay parameters.
28+
29+
[:octicons-arrow-right-24: Usage](usage.md)
30+
31+
- :material-chart-line:{ .lg .middle } __Analysis__
32+
33+
---
34+
35+
Generate benchmarks, analyze performance metrics, and integrate with BalatroBench for comprehensive statistics.
36+
37+
[:octicons-arrow-right-24: Analysis](analysis.md)
38+
39+
- :octicons-sparkle-fill-16:{ .lg .middle } __Documentation for LLM__
40+
41+
---
42+
43+
Documentation in [llms.txt](https://llmstxt.org/) format. Just paste the following link (or its content) into the LLM chat.
44+
45+
[:octicons-arrow-right-24: llms-full.txt](llms-full.txt)
46+
47+
</div>

docs/setup.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Setup
2+
3+
This guide will help you install and configure BalatroLLM for running LLM-powered Balatro bots.
4+
5+
## Prerequisites
6+
7+
- **Python 3.13+**: BalatroLLM requires Python 3.13 or later
8+
- **Balatro Game**: You need a copy of Balatro installed
9+
- **BalatroBot**: The underlying framework for Balatro automation
10+
- **API Access**: An API key for LLM providers (OpenRouter recommended)
11+
12+
## Installation
13+
14+
### 1. Install BalatroLLM
15+
16+
```bash
17+
# Clone the repository
18+
git clone https://github.com/S1M0N38/balatrollm.git
19+
cd balatrollm
20+
21+
# Install with uv (recommended)
22+
uv sync --all-extras --group dev
23+
24+
# Or install with pip
25+
pip install -e .
26+
```
27+
28+
### 2. Set up BalatroBot
29+
30+
BalatroLLM depends on BalatroBot for game communication. Follow the [BalatroBot installation guide](https://s1m0n38.github.io/balatrobot/installation/) to:
31+
32+
1. Install the BalatroBot Steamodded mod
33+
2. Configure Balatro for bot communication
34+
3. Verify the setup works
35+
36+
### 3. Configure Environment Variables
37+
38+
Create a `.envrc` file in the project root:
39+
40+
```bash
41+
# Copy the example file
42+
cp .envrc.example .envrc
43+
44+
# Edit with your API key
45+
export OPENROUTER_API_KEY="your-api-key-here"
46+
47+
# Load the environment
48+
source .envrc
49+
```
50+
51+
### 4. Verify Installation
52+
53+
Test that everything is working:
54+
55+
```bash
56+
# Check available models
57+
balatrollm --list-models
58+
59+
# Test bot connectivity (requires Balatro running)
60+
balatrollm --help
61+
```
62+
63+
## API Key Setup
64+
65+
### OpenRouter (Recommended)
66+
67+
OpenRouter provides access to multiple LLM providers through a single API:
68+
69+
1. Sign up at [openrouter.ai](https://openrouter.ai)
70+
2. Generate an API key
71+
3. Add to your `.envrc` file as `OPENROUTER_API_KEY`
72+
73+
### Other Providers
74+
75+
BalatroLLM supports any OpenAI-compatible API:
76+
77+
```bash
78+
# Use custom provider
79+
balatrollm --base-url https://api.your-provider.com/v1 --api-key your-key
80+
```
81+
82+
## Game Setup
83+
84+
### Start Balatro
85+
86+
Use the provided script to launch Balatro with bot support:
87+
88+
```bash
89+
# Start single instance on default port 12346
90+
./balatro.sh
91+
92+
# Start with custom port
93+
./balatro.sh -p 12347
94+
95+
# Start multiple instances for parallel runs
96+
./balatro.sh -p 12346 -p 12347
97+
98+
# Start in headless mode for servers
99+
./balatro.sh --headless --fast
100+
```
101+
102+
### Verify Connection
103+
104+
Check that BalatroLLM can connect to the game:
105+
106+
```bash
107+
# Run a quick test (will exit after connection)
108+
balatrollm --runs 1
109+
```
110+
111+
## Troubleshooting
112+
113+
### Connection Issues
114+
115+
If the bot can't connect to Balatro:
116+
117+
1. Ensure Balatro is running with the BalatroBot mod
118+
2. Check that the port matches (default: 12346)
119+
3. Verify firewall settings allow local connections
120+
121+
### API Issues
122+
123+
If you get API errors:
124+
125+
1. Verify your API key is correct
126+
2. Check your account balance/credits
127+
3. Test with a different model using `--model`
128+
129+
### Performance Issues
130+
131+
For better performance:
132+
133+
1. Use `--fast` mode in the Balatro script
134+
2. Run multiple instances in parallel
135+
3. Choose faster models for initial testing

0 commit comments

Comments
 (0)