|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +BalatroBench is a static web application that displays performance leaderboards for LLMs playing the card game Balatro. It's a frontend-only project without build tools - the site uses vanilla JavaScript with Tailwind CSS loaded from CDN. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +### Core Components |
| 12 | + |
| 13 | +- **index.html**: Main leaderboard page with responsive table layout using Tailwind CSS |
| 14 | +- **script.js**: Fetches and renders leaderboard data from JSON files in the data directory |
| 15 | +- **data/**: Contains benchmark results organized by version and strategy |
| 16 | + - `data/benchmarks/v0.8.0/default/leaderboard.json`: Primary leaderboard data |
| 17 | + - Individual model result files in vendor subdirectories (e.g., `openai/gpt-oss-120b.json`) |
| 18 | + |
| 19 | +### Data Structure |
| 20 | + |
| 21 | +The leaderboard displays AI model performance with metrics including: |
| 22 | +- Final round reached (with standard deviation) |
| 23 | +- Success/failure/error rates for API calls |
| 24 | +- Token usage (input/output) |
| 25 | +- Execution time and cost per game |
| 26 | +- Multiple provider usage statistics |
| 27 | + |
| 28 | +Models are identified by `vendor/model` format and ranked by performance metrics. |
| 29 | + |
| 30 | +## Development Commands |
| 31 | + |
| 32 | +### Local Development |
| 33 | + |
| 34 | +```bash |
| 35 | +# Serve the application locally (Python 3) |
| 36 | +python3 -m http.server 8000 |
| 37 | + |
| 38 | +# Then visit http://localhost:8000 |
| 39 | +``` |
| 40 | + |
| 41 | +### File Structure Conventions |
| 42 | + |
| 43 | +- All files use UTF-8 encoding with LF line endings |
| 44 | +- JavaScript: 2-space indentation, 100 character line limit |
| 45 | +- HTML: 2-space indentation, 120 character line limit |
| 46 | +- JSON: 2-space indentation |
| 47 | + |
| 48 | +## Data Management |
| 49 | + |
| 50 | +### Adding New Results |
| 51 | + |
| 52 | +- Leaderboard data is loaded from `data/benchmarks/v0.8.0/default/leaderboard.json` |
| 53 | +- Individual model results stored in `data/benchmarks/v0.8.0/default/[vendor]/[model].json` |
| 54 | +- The application automatically parses `vendor/model` from the `config.model` field |
| 55 | + |
| 56 | +### Result Format |
| 57 | + |
| 58 | +Each entry contains: |
| 59 | +- Run statistics (runs, wins, completion rate) |
| 60 | +- Performance averages and standard deviations |
| 61 | +- API call success metrics |
| 62 | +- Provider usage breakdown |
| 63 | +- Token usage and cost analysis |
| 64 | + |
| 65 | +## Community Contributions |
| 66 | + |
| 67 | +The project accepts AI strategy submissions through a community form process. Contributors develop strategies using the `balatrollm` framework, test locally, then submit via web form for automated server validation and leaderboard inclusion. |
| 68 | + |
| 69 | +## Static Hosting |
| 70 | + |
| 71 | +This is a pure client-side application requiring no backend server - suitable for deployment to static hosting services like GitHub Pages, Netlify, or Vercel. |
0 commit comments