Skip to content

Commit bf0b525

Browse files
committed
refactor: remove submit.html
1 parent ee40756 commit bf0b525

5 files changed

Lines changed: 292 additions & 215 deletions

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@ npx serve .
4444
```
4545
├── index.html # Main page (Official Benchmark)
4646
├── community.html # Community submissions page
47-
├── submit.html # Submission guidelines
47+
├── submit.html # Redirects to CONTRIBUTING.md
48+
├── CONTRIBUTING.md # Detailed submission guidelines
4849
├── js/
4950
│ └── app.js # JavaScript for data loading
5051
├── data/

about.html

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6+
<title>About BalatroBench - LLM Benchmark Platform</title>
7+
<meta name="description" content="Learn about BalatroBench, the AI benchmark platform that evaluates Large Language Models through strategic Balatro gameplay.">
8+
<script src="https://cdn.tailwindcss.com"></script>
9+
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
10+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0/css/all.min.css">
11+
</head>
12+
<body class="bg-gray-900 text-white min-h-screen">
13+
<!-- Navigation -->
14+
<nav class="bg-gray-800 border-b border-gray-700">
15+
<div class="container mx-auto px-4 py-4">
16+
<div class="flex items-center justify-between">
17+
<div class="flex items-center space-x-6">
18+
<h1 class="text-xl font-bold text-blue-400">BalatroBench</h1>
19+
<div class="hidden md:flex space-x-4">
20+
<a href="index.html" class="text-gray-300 hover:text-white">Leaderboard</a>
21+
<a href="community.html" class="text-gray-300 hover:text-white">Community</a>
22+
<a href="about.html" class="text-blue-400 hover:text-blue-300">About</a>
23+
</div>
24+
</div>
25+
</div>
26+
</div>
27+
</nav>
28+
29+
<div class="container mx-auto px-4 py-8 max-w-4xl">
30+
<!-- Hero Section -->
31+
<div class="mb-12">
32+
<h1 class="text-4xl font-bold bg-gradient-to-r from-blue-400 to-purple-400 bg-clip-text text-transparent mb-6">
33+
About BalatroBench
34+
</h1>
35+
<p class="text-gray-300 text-lg leading-relaxed">
36+
BalatroBench is a cutting-edge benchmark platform that evaluates Large Language Models through strategic gameplay in Balatro,
37+
a roguelike poker deck-building game. By testing AI decision-making in a complex, dynamic environment, we measure models'
38+
strategic thinking, tool usage, and adaptive problem-solving capabilities.
39+
</p>
40+
</div>
41+
42+
<!-- What is Balatro Section -->
43+
<div class="bg-gray-800 rounded-lg border border-gray-700 mb-8">
44+
<div class="p-6 border-b border-gray-700">
45+
<h2 class="text-2xl font-bold text-white flex items-center">
46+
<i class="fas fa-gamepad text-purple-400 mr-3"></i>
47+
What is Balatro?
48+
</h2>
49+
</div>
50+
<div class="p-6">
51+
<p class="text-gray-300 mb-4">
52+
Balatro is a sophisticated roguelike poker deck-building game where players defeat "blinds" by playing poker hands to score chips.
53+
The game combines traditional poker mechanics with strategic deck building, resource management, and synergistic card effects.
54+
</p>
55+
<div class="grid md:grid-cols-2 gap-6">
56+
<div>
57+
<h3 class="font-semibold text-white mb-3">Core Mechanics</h3>
58+
<ul class="space-y-2 text-sm text-gray-300">
59+
<li><strong>Scoring Formula:</strong> Chips × Multiplier = Final Score</li>
60+
<li><strong>Ante Structure:</strong> Sets of 3 blinds (Small → Big → Boss)</li>
61+
<li><strong>Limited Resources:</strong> Hands and discards per round</li>
62+
<li><strong>Progressive Difficulty:</strong> Each ante increases challenge</li>
63+
</ul>
64+
</div>
65+
<div>
66+
<h3 class="font-semibold text-white mb-3">Strategic Elements</h3>
67+
<ul class="space-y-2 text-sm text-gray-300">
68+
<li><strong>Joker Synergies:</strong> Cards that modify scoring</li>
69+
<li><strong>Money Management:</strong> Interest system and economics</li>
70+
<li><strong>Hand Type Focus:</strong> Specializing in specific poker hands</li>
71+
<li><strong>Boss Adaptation:</strong> Preparing for unique challenges</li>
72+
</ul>
73+
</div>
74+
</div>
75+
</div>
76+
</div>
77+
78+
<!-- How It Works Section -->
79+
<div class="bg-gray-800 rounded-lg border border-gray-700 mb-8">
80+
<div class="p-6 border-b border-gray-700">
81+
<h2 class="text-2xl font-bold text-white flex items-center">
82+
<i class="fas fa-cogs text-blue-400 mr-3"></i>
83+
How BalatroBench Works
84+
</h2>
85+
</div>
86+
<div class="p-6">
87+
<div class="grid gap-6">
88+
<div class="bg-gray-700 rounded-lg p-4">
89+
<h3 class="font-semibold text-white mb-2 flex items-center">
90+
<span class="bg-blue-500 w-8 h-8 rounded-full flex items-center justify-center text-sm mr-3">1</span>
91+
AI Agent Integration
92+
</h3>
93+
<p class="text-gray-300 text-sm">
94+
Language models are integrated with <strong>BalatroLLM</strong>, an intelligent bot that interfaces with the game through
95+
strategic decision-making tools. The AI receives game state information and must make optimal choices.
96+
</p>
97+
</div>
98+
<div class="bg-gray-700 rounded-lg p-4">
99+
<h3 class="font-semibold text-white mb-2 flex items-center">
100+
<span class="bg-purple-500 w-8 h-8 rounded-full flex items-center justify-center text-sm mr-3">2</span>
101+
Strategic Decision Making
102+
</h3>
103+
<p class="text-gray-300 text-sm">
104+
Models must balance multiple strategic priorities: money management, joker synergies, hand type focus,
105+
boss blind preparation, and resource optimization. Each decision impacts long-term success.
106+
</p>
107+
</div>
108+
<div class="bg-gray-700 rounded-lg p-4">
109+
<h3 class="font-semibold text-white mb-2 flex items-center">
110+
<span class="bg-green-500 w-8 h-8 rounded-full flex items-center justify-center text-sm mr-3">3</span>
111+
Performance Evaluation
112+
</h3>
113+
<p class="text-gray-300 text-sm">
114+
Results are measured across consistent seed sets, evaluating average ante reached, win rates,
115+
token efficiency, and strategic decision quality over multiple runs.
116+
</p>
117+
</div>
118+
</div>
119+
</div>
120+
</div>
121+
122+
<!-- Methodology Section -->
123+
<div class="bg-gray-800 rounded-lg border border-gray-700 mb-8">
124+
<div class="p-6 border-b border-gray-700">
125+
<h2 class="text-2xl font-bold text-white flex items-center">
126+
<i class="fas fa-chart-line text-green-400 mr-3"></i>
127+
Evaluation Methodology
128+
</h2>
129+
</div>
130+
<div class="p-6">
131+
<div class="grid md:grid-cols-2 gap-6">
132+
<div>
133+
<h3 class="font-semibold text-white mb-3">Benchmark Parameters</h3>
134+
<ul class="space-y-2 text-sm text-gray-300">
135+
<li><strong>Game Version:</strong> Balatro v1.0.1n</li>
136+
<li><strong>Consistency:</strong> 100 predefined seeds</li>
137+
<li><strong>Configuration:</strong> Standard deck, no modifications</li>
138+
<li><strong>Reproducibility:</strong> Deterministic testing environment</li>
139+
</ul>
140+
</div>
141+
<div>
142+
<h3 class="font-semibold text-white mb-3">Evaluation Metrics</h3>
143+
<ul class="space-y-2 text-sm text-gray-300">
144+
<li><strong>Average Ante Reached:</strong> Progression depth</li>
145+
<li><strong>Win Rate:</strong> Success across seed variations</li>
146+
<li><strong>Token Efficiency:</strong> Resource usage optimization</li>
147+
<li><strong>Decision Quality:</strong> Strategic choice evaluation</li>
148+
</ul>
149+
</div>
150+
</div>
151+
</div>
152+
</div>
153+
154+
<!-- Why This Matters Section -->
155+
<div class="bg-gray-800 rounded-lg border border-gray-700 mb-8">
156+
<div class="p-6 border-b border-gray-700">
157+
<h2 class="text-2xl font-bold text-white flex items-center">
158+
<i class="fas fa-lightbulb text-yellow-400 mr-3"></i>
159+
Why This Benchmark Matters
160+
</h2>
161+
</div>
162+
<div class="p-6">
163+
<div class="grid gap-4 text-gray-300">
164+
<div>
165+
<h3 class="font-semibold text-white mb-2">Beyond Traditional Benchmarks</h3>
166+
<p class="text-sm">
167+
While most AI benchmarks focus on static question-answering or pattern recognition, BalatroBench evaluates
168+
dynamic decision-making in complex, multi-step scenarios that require strategic thinking and adaptation.
169+
</p>
170+
</div>
171+
<div>
172+
<h3 class="font-semibold text-white mb-2">Real-World Problem Solving</h3>
173+
<p class="text-sm">
174+
The skills tested—resource management, long-term planning, risk assessment, and synergistic thinking—
175+
directly translate to real-world applications in business strategy, project management, and complex problem solving.
176+
</p>
177+
</div>
178+
<div>
179+
<h3 class="font-semibold text-white mb-2">Tool-Using AI Assessment</h3>
180+
<p class="text-sm">
181+
BalatroBench specifically evaluates models' ability to use tools effectively, make strategic decisions with incomplete
182+
information, and optimize for long-term success rather than immediate rewards.
183+
</p>
184+
</div>
185+
</div>
186+
</div>
187+
</div>
188+
189+
<!-- Technical Details Section -->
190+
<div class="bg-gray-800 rounded-lg border border-gray-700 mb-8">
191+
<div class="p-6 border-b border-gray-700">
192+
<h2 class="text-2xl font-bold text-white flex items-center">
193+
<i class="fas fa-code text-blue-400 mr-3"></i>
194+
Technical Implementation
195+
</h2>
196+
</div>
197+
<div class="p-6">
198+
<p class="text-gray-300 mb-4">
199+
BalatroBench is powered by <strong>BalatroLLM</strong>, an open-source Python framework that integrates
200+
Large Language Models with Balatro through the BalatroBot interface.
201+
</p>
202+
<div class="bg-gray-700 rounded-lg p-4">
203+
<h3 class="font-semibold text-white mb-3">Key Components</h3>
204+
<ul class="space-y-2 text-sm text-gray-300">
205+
<li><strong>BalatroBot:</strong> Game interface and automation layer</li>
206+
<li><strong>LLM Integration:</strong> OpenAI-compatible API with strategic prompting</li>
207+
<li><strong>Template System:</strong> Jinja2-based prompt engineering</li>
208+
<li><strong>Evaluation Pipeline:</strong> Automated testing and result aggregation</li>
209+
</ul>
210+
</div>
211+
<div class="mt-4">
212+
<a href="https://github.com/S1M0N38/balatrollm" target="_blank"
213+
class="inline-flex items-center text-blue-400 hover:text-blue-300 transition-colors">
214+
<i class="fab fa-github mr-2"></i>
215+
View on GitHub
216+
<i class="fas fa-external-link-alt ml-1 text-xs"></i>
217+
</a>
218+
</div>
219+
</div>
220+
</div>
221+
222+
<!-- Get Involved Section -->
223+
<div class="bg-gradient-to-r from-blue-500/10 to-purple-500/10 rounded-lg border border-blue-500/20 p-6">
224+
<h2 class="text-2xl font-bold text-white mb-4 flex items-center">
225+
<i class="fas fa-users text-purple-400 mr-3"></i>
226+
Get Involved
227+
</h2>
228+
<p class="text-gray-300 mb-6">
229+
BalatroBench is a community-driven platform. Whether you're an AI researcher, developer, or strategic gaming enthusiast,
230+
there are multiple ways to contribute to advancing LLM evaluation.
231+
</p>
232+
<div class="flex flex-wrap gap-4">
233+
<a href="https://github.com/S1M0N38/balatrobench/blob/main/CONTRIBUTING.md" target="_blank"
234+
class="bg-gradient-to-r from-blue-500 to-purple-600 hover:from-blue-600 hover:to-purple-700 text-white px-6 py-3 rounded-lg font-medium transition-all duration-200 shadow-lg">
235+
<i class="fas fa-upload mr-2"></i>Submit Your Strategy
236+
</a>
237+
<a href="community.html"
238+
class="border border-gray-600 text-gray-300 hover:border-gray-500 hover:text-white px-6 py-3 rounded-lg font-medium transition-colors">
239+
<i class="fas fa-comments mr-2"></i>Join Community
240+
</a>
241+
<a href="https://github.com/S1M0N38/balatrollm" target="_blank"
242+
class="border border-gray-600 text-gray-300 hover:border-gray-500 hover:text-white px-6 py-3 rounded-lg font-medium transition-colors">
243+
<i class="fab fa-github mr-2"></i>Contribute Code
244+
</a>
245+
</div>
246+
</div>
247+
</div>
248+
249+
<script src="js/app.js"></script>
250+
</body>
251+
</html>

community.html

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,9 @@
1717
<div class="flex items-center space-x-6">
1818
<h1 class="text-xl font-bold text-blue-400">BalatroBench</h1>
1919
<div class="hidden md:flex space-x-4">
20-
<a href="index.html" class="text-gray-300 hover:text-white">Official Benchmark</a>
20+
<a href="index.html" class="text-gray-300 hover:text-white">Leaderboard</a>
2121
<a href="community.html" class="text-blue-400 hover:text-blue-300">Community</a>
22+
<a href="about.html" class="text-gray-300 hover:text-white">About</a>
2223
</div>
2324
</div>
2425
</div>
@@ -44,7 +45,7 @@ <h1 class="text-4xl font-bold bg-gradient-to-r from-blue-400 to-purple-400 bg-cl
4445

4546
<!-- Submit Strategy Button -->
4647
<div class="flex justify-center mt-12 mb-8">
47-
<a href="submit.html" class="inline-block bg-gradient-to-r from-blue-500 to-purple-600 hover:from-blue-600 hover:to-purple-700 text-white px-8 py-4 rounded-lg font-medium transition-all duration-200 shadow-lg text-lg">
48+
<a href="https://github.com/S1M0N38/balatrobench/blob/main/CONTRIBUTING.md" target="_blank" class="inline-block bg-gradient-to-r from-blue-500 to-purple-600 hover:from-blue-600 hover:to-purple-700 text-white px-8 py-4 rounded-lg font-medium transition-all duration-200 shadow-lg text-lg">
4849
<i class="fas fa-plus mr-2"></i>Submit Your Strategy
4950
</a>
5051
</div>
@@ -59,32 +60,38 @@ <h2 class="text-2xl font-bold text-white">How to Submit</h2>
5960
<div class="grid md:grid-cols-3 gap-6 text-sm">
6061
<div class="text-center">
6162
<div class="bg-blue-500 w-12 h-12 rounded-full flex items-center justify-center mx-auto mb-3">
62-
<i class="fas fa-edit text-xl"></i>
63+
<i class="fas fa-code-branch text-xl"></i>
6364
</div>
64-
<h4 class="font-semibold text-white mb-2">1. Prepare Your Prompt</h4>
65+
<h4 class="font-semibold text-white mb-2">1. Fork & Clone</h4>
6566
<p class="text-gray-300">
66-
Create a custom system prompt with your unique strategy approach.
67+
Fork the repository and clone it to your local machine.
6768
</p>
6869
</div>
6970
<div class="text-center">
7071
<div class="bg-purple-500 w-12 h-12 rounded-full flex items-center justify-center mx-auto mb-3">
71-
<i class="fas fa-play text-xl"></i>
72+
<i class="fas fa-file-alt text-xl"></i>
7273
</div>
73-
<h4 class="font-semibold text-white mb-2">2. Run Benchmark</h4>
74+
<h4 class="font-semibold text-white mb-2">2. Create JSON</h4>
7475
<p class="text-gray-300">
75-
Test your prompt against our standard seed set using our API.
76+
Add your strategy as a JSON file in the data/strategies/ folder.
7677
</p>
7778
</div>
7879
<div class="text-center">
7980
<div class="bg-green-500 w-12 h-12 rounded-full flex items-center justify-center mx-auto mb-3">
80-
<i class="fas fa-upload text-xl"></i>
81+
<i class="fas fa-pull-request text-xl"></i>
8182
</div>
82-
<h4 class="font-semibold text-white mb-2">3. Submit Results</h4>
83+
<h4 class="font-semibold text-white mb-2">3. Submit PR</h4>
8384
<p class="text-gray-300">
84-
Upload your results with a description of your strategy.
85+
Create a pull request with your strategy data.
8586
</p>
8687
</div>
8788
</div>
89+
<div class="text-center mt-6">
90+
<a href="https://github.com/S1M0N38/balatrobench/blob/main/CONTRIBUTING.md" target="_blank" class="text-blue-400 hover:text-blue-300 transition-colors">
91+
<i class="fas fa-external-link-alt mr-1"></i>
92+
View detailed submission guidelines
93+
</a>
94+
</div>
8895
</div>
8996
</div>
9097
</div>

0 commit comments

Comments
 (0)