Skip to content

Commit cd02c3d

Browse files
committed
docs: update CLAUDE.md
1 parent 61d287c commit cd02c3d

1 file changed

Lines changed: 44 additions & 17 deletions

File tree

CLAUDE.md

Lines changed: 44 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,41 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
BalatroBench is a static web application that displays performance leaderboards for LLMs playing the card game Balatro. It's a frontend-only project without build tools - the site uses vanilla JavaScript with Tailwind CSS loaded from CDN.
7+
BalatroBench is a static web application that displays performance leaderboards for LLMs playing the card game Balatro. It's a frontend-only project without build tools - the site uses vanilla JavaScript with Tailwind CSS and Chart.js loaded from CDN.
88

99
## Architecture
1010

1111
### Core Components
1212

1313
- **index.html**: Main leaderboard page with responsive table layout using Tailwind CSS
14-
- **script.js**: Fetches and renders leaderboard data from JSON files in the data directory
15-
- **data/**: Contains benchmark results organized by version and strategy
16-
- `data/benchmarks/v0.8.0/default/leaderboard.json`: Primary leaderboard data
14+
- **script.js**: Fetches and renders leaderboard data, with interactive expandable rows showing detailed charts and statistics
15+
- **data/**: Contains benchmark results organized by version, strategy, and data type
16+
- `data/benchmarks/v0.8.0/default/leaderboard.json`: Primary model leaderboard data
17+
- `data/community/v0.8.0/default/leaderboard.json`: Community strategy leaderboard data
1718
- Individual model result files in vendor subdirectories (e.g., `openai/gpt-oss-120b.json`)
1819

1920
### Data Structure
2021

2122
The leaderboard displays AI model performance with metrics including:
2223
- Final round reached (with standard deviation)
2324
- Success/failure/error rates for API calls
24-
- Token usage (input/output)
25-
- Execution time and cost per game
26-
- Multiple provider usage statistics
25+
- Token usage (input/output with standard deviations)
26+
- Execution time and cost per game (with standard deviations)
27+
- Provider usage distribution
28+
- Detailed per-game statistics and histograms
2729

2830
Models are identified by `vendor/model` format and ranked by performance metrics.
2931

32+
### Interactive Features
33+
34+
- **Expandable Rows**: Click on desktop (lg+) to expand detailed view with:
35+
- Round distribution histogram using Chart.js
36+
- Provider usage pie chart
37+
- Complete per-game statistics table
38+
- Total aggregated metrics (tokens, costs, time)
39+
- **Responsive Design**: Columns hide/show based on screen size
40+
- **Dual Display Modes**: Support for both model-based and community strategy leaderboards
41+
3042
## Development Commands
3143

3244
### Local Development
@@ -38,6 +50,12 @@ python3 -m http.server 8000
3850
# Then visit http://localhost:8000
3951
```
4052

53+
### Dependencies
54+
55+
- **Tailwind CSS**: Styling framework loaded from CDN
56+
- **Chart.js**: Charting library for histograms and pie charts
57+
- **Heroicons**: Icon library (included but minimal usage in current implementation)
58+
4159
### File Structure Conventions
4260

4361
- All files use UTF-8 encoding with LF line endings
@@ -47,20 +65,29 @@ python3 -m http.server 8000
4765

4866
## Data Management
4967

50-
### Adding New Results
68+
### Data Organization
69+
70+
**Benchmark Data** (`data/benchmarks/v0.8.0/default/`):
71+
- `leaderboard.json`: Aggregated model performance data
72+
- `[vendor]/[model].json`: Detailed individual model results with per-game statistics
5173

52-
- Leaderboard data is loaded from `data/benchmarks/v0.8.0/default/leaderboard.json`
53-
- Individual model results stored in `data/benchmarks/v0.8.0/default/[vendor]/[model].json`
54-
- The application automatically parses `vendor/model` from the `config.model` field
74+
**Community Data** (`data/community/v0.8.0/default/`):
75+
- `leaderboard.json`: Community strategy performance data
76+
- `[vendor]/[model].json`: Detailed strategy results
5577

5678
### Result Format
5779

58-
Each entry contains:
59-
- Run statistics (runs, wins, completion rate)
60-
- Performance averages and standard deviations
61-
- API call success metrics
62-
- Provider usage breakdown
63-
- Token usage and cost analysis
80+
**Leaderboard entries** contain:
81+
- Model/strategy configuration and metadata
82+
- Aggregated performance averages and standard deviations
83+
- API call success/failure/error rates
84+
- Token usage and cost summaries
85+
86+
**Detailed model files** contain:
87+
- Individual game statistics (`stats` array)
88+
- Provider usage distribution (`providers` object)
89+
- Total aggregated metrics (`total` object)
90+
- Per-game breakdowns including final round, token usage, timing, and costs
6491

6592
## Community Contributions
6693

0 commit comments

Comments
 (0)