Skip to content

Commit 18975e4

Browse files
Deployed 7807fbe with MkDocs version: 1.6.1
0 parents  commit 18975e4

57 files changed

Lines changed: 11760 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.nojekyll

Whitespace-only changes.

404.html

Lines changed: 402 additions & 0 deletions
Large diffs are not rendered by default.

analysis/index.html

Lines changed: 637 additions & 0 deletions
Large diffs are not rendered by default.

analysis/index.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Analysis
2+
3+
Generate comprehensive benchmarks, analyze performance metrics, and integrate with BalatroBench for detailed statistics visualization.
4+
5+
## Benchmark Generation
6+
7+
Run benchmarks to evaluate model performance:
8+
9+
```bash
10+
balatrollm benchmark
11+
```
12+
13+
## Benchmark Results
14+
15+
Benchmarks are organized hierarchically:
16+
17+
```text
18+
benchmarks/
19+
├── v0.10.0/ # Version
20+
│ ├── default/ # Strategy
21+
│ │ ├── leaderboard.json # Strategy summary with aggregated stats
22+
│ │ └── openai/ # Provider
23+
│ │ ├── gpt-oss-20b.json # Model performance summary
24+
│ │ ├── gpt-oss-120b.json # Model performance summary
25+
│ │ ├── gpt-oss-20b/ # Individual runs for model
26+
│ │ │ ├── 20250922_124308_887_RedDeck_s1__OOOO155/
27+
│ │ │ │ ├── request-00001/ # Individual LLM request
28+
│ │ │ │ │ ├── reasoning.md # LLM reasoning process
29+
│ │ │ │ │ ├── request.md # Full request sent to LLM
30+
│ │ │ │ │ ├── screenshot.png # Game state screenshot
31+
│ │ │ │ │ └── tool_call.json # Function call details
32+
│ │ │ │ ├── request-00002/
33+
│ │ │ │ └── ...
34+
│ │ │ └── [other runs]
35+
│ │ └── gpt-oss-120b/ # Individual runs for model
36+
│ │ └── [similar structure]
37+
│ └── aggressive/ # Other strategies
38+
│ └── [similar structure]
39+
```
40+
41+
## BalatroBench Integration
42+
43+
### Overview
44+
45+
[BalatroBench](https://s1m0n38.github.io/balatrobench/) is a web-based dashboard for visualizing and comparing LLM performance in Balatro. It provides interactive charts, leaderboards, and detailed analytics.
46+
47+
### Integrating with BalatroBench
48+
49+
To use BalatroBench as a local dashboard for visualizing your benchmark results:
50+
51+
```bash
52+
# Step 1: Generate runs with custom output directory
53+
balatrollm --runs-dir example-runs --runs 20
54+
55+
# Step 2: Generate benchmark analysis
56+
balatrollm benchmark --runs-dir example-runs --output-dir example-benchmark
57+
58+
# Step 3: Clone BalatroBench repository
59+
git clone https://github.com/S1M0N38/balatrobench.git /path/to/balatrobench
60+
61+
# Step 4: Move benchmark data to BalatroBench (or create symbolic link)
62+
mv example-benchmark/benchmarks /path/to/balatrobench/data/benchmarks
63+
# OR create a symbolic link:
64+
# ln -s $(pwd)/example-benchmark/benchmarks /path/to/balatrobench/data/benchmarks
65+
66+
# Step 5: Host BalatroBench locally
67+
cd /path/to/balatrobench
68+
python3 -m http.server 8001
69+
# Then visit http://localhost:8001
70+
```

assets/images/favicon.png

1.83 KB
Loading

assets/javascripts/bundle.f55a23d4.min.js

Lines changed: 16 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

assets/javascripts/bundle.f55a23d4.min.js.map

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

assets/javascripts/lunr/min/lunr.ar.min.js

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

assets/javascripts/lunr/min/lunr.da.min.js

Lines changed: 18 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

assets/javascripts/lunr/min/lunr.de.min.js

Lines changed: 18 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)