Skip to content

Commit dc81a31

Browse files
committed
docs: update README with new project structure and workflow
- Document new benchmark data organization by version - Explain template-based strategy system with Jinja2 - Update contribution guidelines for raw benchmark submissions - Add processing pipeline documentation - Update project structure diagram and setup instructions
1 parent f1218df commit dc81a31

1 file changed

Lines changed: 129 additions & 54 deletions

File tree

README.md

Lines changed: 129 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This is a **static website** that works with any web server or GitHub Pages. No
1515
1. Clone the repository:
1616
```bash
1717
git clone <YOUR_GIT_URL>
18-
cd balatrobench-site
18+
cd balatrobench
1919
```
2020

2121
2. Serve the files locally:
@@ -37,22 +37,36 @@ npx serve .
3737
2. Go to repository Settings > Pages
3838
3. Set source to "Deploy from a branch"
3939
4. Select `main` branch and `/ (root)` folder
40-
5. Your site will be available at `https://yourusername.github.io/balatrobench-site`
40+
5. Your site will be available at `https://yourusername.github.io/balatrobench`
4141

4242
## 📁 Project Structure
4343

4444
```
4545
├── index.html # Main page (Official Benchmark)
4646
├── community.html # Community submissions page
47+
├── about.html # About page with detailed information
4748
├── submit.html # Redirects to CONTRIBUTING.md
4849
├── CONTRIBUTING.md # Detailed submission guidelines
4950
├── js/
50-
│ └── app.js # JavaScript for data loading
51+
│ └── app.js # JavaScript for data loading and UI
52+
├── scripts/
53+
│ ├── process-benchmarks.js # Benchmark data processor
54+
│ └── analyze-benchmarks.js # Benchmark analysis tool
5155
├── data/
52-
│ ├── leaderboard.json # Official benchmark results
53-
│ └── strategies/ # Community submissions
54-
│ ├── strategy1.json
55-
│ └── strategy2.json
56+
│ ├── benchmarks/ # Raw benchmark data by version
57+
│ │ └── v0.2.0/
58+
│ │ └── default/
59+
│ │ ├── leaderboard.json # Processed leaderboard
60+
│ │ ├── cerebras-*.json # Individual benchmark runs
61+
│ │ └── ...
62+
│ ├── strategies/ # Strategy templates and tools
63+
│ │ └── default/
64+
│ │ ├── TOOLS.json # Available game tools
65+
│ │ ├── STRATEGY.md.jinja # Strategy template
66+
│ │ ├── MEMORY.md.jinja # Memory template
67+
│ │ └── GAMESTATE.md.jinja # Game state template
68+
│ └── leaderboard.json # Main leaderboard (auto-generated)
69+
├── public/ # Static assets
5670
└── README.md
5771
```
5872

@@ -61,67 +75,88 @@ npx serve .
6175
The official leaderboard tracks performance across standardized seeds and configurations:
6276

6377
- **Balatro Version**: v1.0.1n
64-
- **Seeds**: 100 consistent seeds for reproducibility
65-
- **Metrics**: Average ante reached, win rate, token efficiency
66-
- **Models**: GPT-4o, Claude-3.5-Sonnet, Gemini-Pro, and more
78+
- **Framework**: BalatroLLM v0.2.0+
79+
- **Strategy**: Template-based strategic prompting system
80+
- **Seeds**: Consistent seeds for reproducibility
81+
- **Metrics**: Average ante reached, win rate, token efficiency, completion rate
82+
- **Models**: Cerebras GPT-OSS-120B, Cerebras Qwen3-235B, and more
6783

6884
## 👥 Community Contributions
6985

70-
### Submitting Your Strategy
86+
### Submitting Your Results
7187

72-
1. **Fork this repository**
73-
2. **Create a strategy file** in `data/strategies/` following this format:
74-
75-
```json
76-
{
77-
"title": "Your Strategy Name",
78-
"author": "YourUsername",
79-
"model": "GPT-4o",
80-
"score": "8.5",
81-
"winRate": "75%",
82-
"avgTokens": "15000",
83-
"date": "2024-01-20",
84-
"description": "Brief description of your approach",
85-
"prompt": "Your full system prompt...",
86-
"methodology": "Detailed explanation...",
87-
"results": {
88-
"seeds": [1, 2, 3],
89-
"scores": [8.2, 8.8, 8.1]
90-
},
91-
"tags": ["tag1", "tag2"]
92-
}
93-
```
88+
There are two ways to contribute:
89+
90+
#### Option 1: Submit Raw Benchmark Data (Recommended)
91+
92+
1. **Run benchmarks** using the BalatroLLM framework
93+
2. **Add your results** to `data/benchmarks/v{version}/{strategy}/`
94+
- Include the complete `{model}_benchmark.json` files
95+
- These contain full game progression, LLM interactions, and tool calls
96+
3. **Process the data** using `node scripts/process-benchmarks.js`
97+
4. **Submit a Pull Request** with both raw data and updated leaderboard
9498

95-
3. **Submit a Pull Request** with title: "Community Submission: [Your Strategy Name]"
99+
#### Option 2: Submit Strategy Documentation
96100

97-
### Strategy Requirements
101+
1. **Fork this repository**
102+
2. **Create strategy templates** in `data/strategies/{your-strategy}/`
103+
- Copy the structure from `data/strategies/default/`
104+
- Customize the Jinja2 templates for your approach
105+
3. **Document your methodology** with clear explanations
106+
4. **Submit a Pull Request** with title: "Strategy Contribution: [Your Strategy Name]"
107+
108+
### Submission Requirements
98109

99-
- ✅ Valid benchmark results on standard seeds
110+
- ✅ Valid benchmark results from BalatroLLM v0.2.0+
111+
- ✅ Complete game progression data (not just summary statistics)
100112
- ✅ Clear strategy description and methodology
101-
- ✅ Reproducible results
102-
- ✅ Follows JSON schema format
113+
- ✅ Reproducible results with seed consistency
103114
- ✅ No offensive or inappropriate content
104115

105116
## 🛠️ Technologies Used
106117

107118
- **HTML5** - Semantic markup
108119
- **Tailwind CSS** - Styling (via CDN)
109120
- **Vanilla JavaScript** - Dynamic content loading
121+
- **Node.js** - Benchmark processing scripts
122+
- **Jinja2 Templates** - Strategy prompt templating
110123
- **Font Awesome** - Icons (via CDN)
111-
- **JSON** - Data storage
124+
- **JSON** - Data storage and interchange
112125

113126
## 📊 Data Management
114127

115-
All data is stored in JSON files for simplicity:
128+
BalatroBench uses a sophisticated data management system:
129+
130+
### Raw Benchmark Data
131+
- `data/benchmarks/v{version}/{strategy}/` - Versioned benchmark results
132+
- Individual files: `{model}_benchmark.json` - Complete run data with game progression, LLM interactions, and tool calls
133+
- Structured by version and strategy for historical tracking
134+
135+
### Processed Data
136+
- `data/leaderboard.json` - Auto-generated leaderboard from all benchmark data
137+
- Aggregated statistics: performance scores, win rates, token efficiency
138+
- Generated by `scripts/process-benchmarks.js`
116139

117-
- `data/leaderboard.json` - Official benchmark results
118-
- `data/strategies/*.json` - Community submissions
140+
### Strategy System
141+
- `data/strategies/default/` - Template-based strategy system
142+
- Jinja2 templates for consistent prompting across models
143+
- Tool definitions and game state templates
119144

120-
This approach allows for:
121-
- Version control of all data
122-
- Easy community contributions via PRs
145+
### Processing Pipeline
146+
```bash
147+
# Process raw benchmark data into leaderboard
148+
node scripts/process-benchmarks.js
149+
150+
# Alternative analysis tool
151+
node scripts/analyze-benchmarks.js
152+
```
153+
154+
This approach provides:
155+
- Version control of all data and results
156+
- Automated leaderboard generation
157+
- Historical benchmark tracking
158+
- Reproducible evaluation methodology
123159
- No database setup required
124-
- GitHub Pages compatibility
125160

126161
## 🤝 Contributing
127162

@@ -133,21 +168,61 @@ We welcome contributions! You can:
133168

134169
## 📈 Adding New Official Results
135170

136-
To update the official leaderboard:
171+
To add new benchmark results:
172+
173+
### Adding Raw Benchmark Data
174+
175+
1. **Add benchmark files** to `data/benchmarks/v{version}/{strategy}/`
176+
- Use format: `{model}_benchmark.json`
177+
- Include complete run data from BalatroLLM framework
178+
179+
2. **Process the data** to update leaderboards:
180+
```bash
181+
node scripts/process-benchmarks.js
182+
```
137183

138-
1. Edit `data/leaderboard.json`
139-
2. Follow the existing schema
140-
3. Submit a pull request
184+
3. **Submit a pull request** with both raw data and updated leaderboard
141185

142-
## 🔧 Customization
186+
### Data Processing Tools
143187

144-
Want to customize the site?
188+
The project includes two processing scripts:
189+
190+
- **`process-benchmarks.js`** - Primary tool for generating leaderboards from benchmark data
191+
- **`analyze-benchmarks.js`** - Alternative analysis tool with different aggregation methods
192+
193+
Both scripts automatically scan the `data/benchmarks/` directory and process all available benchmark files.
194+
195+
## 🔧 Development & Customization
196+
197+
### Local Development Setup
198+
199+
```bash
200+
# Clone the repository
201+
git clone <YOUR_GIT_URL>
202+
cd balatrobench
203+
204+
# Install Node.js dependencies (for processing scripts)
205+
# No package.json yet - scripts use built-in Node.js modules
206+
207+
# Serve the website locally
208+
python -m http.server 8000
209+
# or: npx serve .
210+
```
211+
212+
### Customization Options
145213

146214
- **Styling**: Modify Tailwind classes in HTML files
147-
- **Functionality**: Edit `js/app.js`
148-
- **Data**: Add/modify JSON files in `data/`
215+
- **Functionality**: Edit `js/app.js` for frontend behavior
216+
- **Data Processing**: Customize `scripts/process-benchmarks.js`
217+
- **Strategy Templates**: Add new templates in `data/strategies/`
149218
- **Pages**: Create new HTML files following the existing pattern
150219

220+
### Processing Scripts
221+
222+
- **Process benchmarks**: `node scripts/process-benchmarks.js`
223+
- **Alternative analysis**: `node scripts/analyze-benchmarks.js`
224+
- Both scripts output to `data/leaderboard.json`
225+
151226
## 📜 License
152227

153228
This project is open source. Feel free to use, modify, and distribute.

0 commit comments

Comments
 (0)