Skip to content

Commit 86c5653

Browse files
committed
docs: remove legacy README
1 parent 292dff9 commit 86c5653

1 file changed

Lines changed: 1 addition & 232 deletions

File tree

README.md

Lines changed: 1 addition & 232 deletions
Original file line numberDiff line numberDiff line change
@@ -1,234 +1,3 @@
11
# BalatroBench
22

3-
A community-driven benchmark platform for evaluating Large Language Models' strategic performance in Balatro through intelligent tool-calling and decision-making.
4-
5-
## 🎯 What is BalatroBench?
6-
7-
BalatroBench provides a standardized way to evaluate how well different AI models can play Balatro, the popular poker-inspired roguelike card game. The benchmark tests strategic thinking, decision-making, and tool-calling capabilities across different LLM models.
8-
9-
## 🚀 Quick Start
10-
11-
This is a **static website** that works with any web server or GitHub Pages. No build process required!
12-
13-
### Local Development
14-
15-
1. Clone the repository:
16-
```bash
17-
git clone <YOUR_GIT_URL>
18-
cd balatrobench
19-
```
20-
21-
2. Serve the files locally:
22-
```bash
23-
# Using Python (recommended)
24-
python -m http.server 8000
25-
26-
# Using Node.js (if you have it)
27-
npx serve .
28-
29-
# Using any other static file server
30-
```
31-
32-
3. Open http://localhost:8000 in your browser
33-
34-
### GitHub Pages Deployment
35-
36-
1. Push your changes to the `main` branch
37-
2. Go to repository Settings > Pages
38-
3. Set source to "Deploy from a branch"
39-
4. Select `main` branch and `/ (root)` folder
40-
5. Your site will be available at `https://yourusername.github.io/balatrobench`
41-
42-
## 📁 Project Structure
43-
44-
```
45-
├── index.html # Main page (Official Benchmark)
46-
├── community.html # Community submissions page
47-
├── about.html # About page with detailed information
48-
├── submit.html # Redirects to CONTRIBUTING.md
49-
├── CONTRIBUTING.md # Detailed submission guidelines
50-
├── js/
51-
│ └── app.js # JavaScript for data loading and UI
52-
├── scripts/
53-
│ ├── process-benchmarks.js # Benchmark data processor
54-
│ └── analyze-benchmarks.js # Benchmark analysis tool
55-
├── data/
56-
│ ├── benchmarks/ # Raw benchmark data by version
57-
│ │ └── v0.2.0/
58-
│ │ └── default/
59-
│ │ ├── leaderboard.json # Processed leaderboard
60-
│ │ ├── cerebras-*.json # Individual benchmark runs
61-
│ │ └── ...
62-
│ ├── strategies/ # Strategy templates and tools
63-
│ │ └── default/
64-
│ │ ├── TOOLS.json # Available game tools
65-
│ │ ├── STRATEGY.md.jinja # Strategy template
66-
│ │ ├── MEMORY.md.jinja # Memory template
67-
│ │ └── GAMESTATE.md.jinja # Game state template
68-
│ └── leaderboard.json # Main leaderboard (auto-generated)
69-
├── public/ # Static assets
70-
└── README.md
71-
```
72-
73-
## 🏆 Official Benchmark
74-
75-
The official leaderboard tracks performance across standardized seeds and configurations:
76-
77-
- **Balatro Version**: v1.0.1n
78-
- **Framework**: BalatroLLM v0.2.0+
79-
- **Strategy**: Template-based strategic prompting system
80-
- **Seeds**: Consistent seeds for reproducibility
81-
- **Metrics**: Average ante reached, win rate, token efficiency, completion rate
82-
- **Models**: Cerebras GPT-OSS-120B, Cerebras Qwen3-235B, and more
83-
84-
## 👥 Community Contributions
85-
86-
### Submitting Your Results
87-
88-
There are two ways to contribute:
89-
90-
#### Option 1: Submit Raw Benchmark Data (Recommended)
91-
92-
1. **Run benchmarks** using the BalatroLLM framework
93-
2. **Add your results** to `data/benchmarks/v{version}/{strategy}/`
94-
- Include the complete `{model}_benchmark.json` files
95-
- These contain full game progression, LLM interactions, and tool calls
96-
3. **Process the data** using `node scripts/process-benchmarks.js`
97-
4. **Submit a Pull Request** with both raw data and updated leaderboard
98-
99-
#### Option 2: Submit Strategy Documentation
100-
101-
1. **Fork this repository**
102-
2. **Create strategy templates** in `data/strategies/{your-strategy}/`
103-
- Copy the structure from `data/strategies/default/`
104-
- Customize the Jinja2 templates for your approach
105-
3. **Document your methodology** with clear explanations
106-
4. **Submit a Pull Request** with title: "Strategy Contribution: [Your Strategy Name]"
107-
108-
### Submission Requirements
109-
110-
- ✅ Valid benchmark results from BalatroLLM v0.2.0+
111-
- ✅ Complete game progression data (not just summary statistics)
112-
- ✅ Clear strategy description and methodology
113-
- ✅ Reproducible results with seed consistency
114-
- ✅ No offensive or inappropriate content
115-
116-
## 🛠️ Technologies Used
117-
118-
- **HTML5** - Semantic markup
119-
- **Tailwind CSS** - Styling (via CDN)
120-
- **Vanilla JavaScript** - Dynamic content loading
121-
- **Node.js** - Benchmark processing scripts
122-
- **Jinja2 Templates** - Strategy prompt templating
123-
- **Font Awesome** - Icons (via CDN)
124-
- **JSON** - Data storage and interchange
125-
126-
## 📊 Data Management
127-
128-
BalatroBench uses a sophisticated data management system:
129-
130-
### Raw Benchmark Data
131-
- `data/benchmarks/v{version}/{strategy}/` - Versioned benchmark results
132-
- Individual files: `{model}_benchmark.json` - Complete run data with game progression, LLM interactions, and tool calls
133-
- Structured by version and strategy for historical tracking
134-
135-
### Processed Data
136-
- `data/leaderboard.json` - Auto-generated leaderboard from all benchmark data
137-
- Aggregated statistics: performance scores, win rates, token efficiency
138-
- Generated by `scripts/process-benchmarks.js`
139-
140-
### Strategy System
141-
- `data/strategies/default/` - Template-based strategy system
142-
- Jinja2 templates for consistent prompting across models
143-
- Tool definitions and game state templates
144-
145-
### Processing Pipeline
146-
```bash
147-
# Process raw benchmark data into leaderboard
148-
node scripts/process-benchmarks.js
149-
150-
# Alternative analysis tool
151-
node scripts/analyze-benchmarks.js
152-
```
153-
154-
This approach provides:
155-
- Version control of all data and results
156-
- Automated leaderboard generation
157-
- Historical benchmark tracking
158-
- Reproducible evaluation methodology
159-
- No database setup required
160-
161-
## 🤝 Contributing
162-
163-
We welcome contributions! You can:
164-
165-
1. **Submit strategies** via pull requests
166-
2. **Report issues** or suggest improvements
167-
3. **Improve the website** (design, features, documentation)
168-
169-
## 📈 Adding New Official Results
170-
171-
To add new benchmark results:
172-
173-
### Adding Raw Benchmark Data
174-
175-
1. **Add benchmark files** to `data/benchmarks/v{version}/{strategy}/`
176-
- Use format: `{model}_benchmark.json`
177-
- Include complete run data from BalatroLLM framework
178-
179-
2. **Process the data** to update leaderboards:
180-
```bash
181-
node scripts/process-benchmarks.js
182-
```
183-
184-
3. **Submit a pull request** with both raw data and updated leaderboard
185-
186-
### Data Processing Tools
187-
188-
The project includes two processing scripts:
189-
190-
- **`process-benchmarks.js`** - Primary tool for generating leaderboards from benchmark data
191-
- **`analyze-benchmarks.js`** - Alternative analysis tool with different aggregation methods
192-
193-
Both scripts automatically scan the `data/benchmarks/` directory and process all available benchmark files.
194-
195-
## 🔧 Development & Customization
196-
197-
### Local Development Setup
198-
199-
```bash
200-
# Clone the repository
201-
git clone <YOUR_GIT_URL>
202-
cd balatrobench
203-
204-
# Install Node.js dependencies (for processing scripts)
205-
# No package.json yet - scripts use built-in Node.js modules
206-
207-
# Serve the website locally
208-
python -m http.server 8000
209-
# or: npx serve .
210-
```
211-
212-
### Customization Options
213-
214-
- **Styling**: Modify Tailwind classes in HTML files
215-
- **Functionality**: Edit `js/app.js` for frontend behavior
216-
- **Data Processing**: Customize `scripts/process-benchmarks.js`
217-
- **Strategy Templates**: Add new templates in `data/strategies/`
218-
- **Pages**: Create new HTML files following the existing pattern
219-
220-
### Processing Scripts
221-
222-
- **Process benchmarks**: `node scripts/process-benchmarks.js`
223-
- **Alternative analysis**: `node scripts/analyze-benchmarks.js`
224-
- Both scripts output to `data/leaderboard.json`
225-
226-
## 📜 License
227-
228-
This project is open source. Feel free to use, modify, and distribute.
229-
230-
## 🙋‍♀️ Support
231-
232-
- Open an issue on GitHub
233-
- Join our Discord community
234-
- Email: community@balatrobench.dev
3+
A benchmark platform for evaluating Large Language Models' strategic performance in Balatro through intelligent tool-calling and decision-making.

0 commit comments

Comments
 (0)