UPDATE: Retro is up at https://marin.readthedocs.io/en/latest/reports/marin-32b-retro/
Description
We're training a 32B! We are starting from nemotron+starcoder+proofpile since we have strong internal signal and external signal (the Nemotron paper itself) that Nemotron is the best large scale web data.
Links
Results
Seems like a pretty good model!
| Model |
Average |
AGI Eval LSAT-AR |
ARC Easy |
ARC Challenge |
BoolQ |
CommonSense QA |
COPA |
HellaSwag |
lambada_openai |
OpenBookQA |
PIQA |
WinoGrande |
WSC |
MMLU |
GPQA |
BBH |
MMLU Pro |
HumanEval |
GSM8K |
MATH |
| Marin 32B (Bison) |
63.0 |
23.4 |
87.8 |
65.8 |
88.9 |
82.3 |
94.0 |
86.6 |
77.4 |
46.6 |
86.1 |
78.61 |
82.42 |
72.9 |
32.13 |
55.2 |
41.9 |
29.27 |
54.71 |
10.35 |
| Marin 32B (Mantis) |
65.2 |
24.8 |
88.0 |
65.7 |
89.4 |
82.8 |
93.0 |
86.9 |
77.2 |
46.4 |
85.9 |
79.3 |
79.5 |
74.7 |
34.0 |
59.6 |
45.1 |
42.7 |
69.1 |
15.3 |
| OLMo 2 32B Base |
63.2 |
22.6 |
85.9 |
61.86 |
83.0 |
78.6 |
93.0 |
85.9 |
78.3 |
47.2 |
83.08 |
78.85 |
86.81 |
71.85 |
32.21 |
56.07 |
42.0 |
23.78 |
76.35 |
12.69 |
| Qwen 2.5 32B Base |
68.1 |
30.43 |
80.81 |
55.89 |
87.65 |
88.45 |
87.0 |
84.11 |
77.62 |
44.4 |
82.4 |
75.7 |
80.95 |
80.83 |
39.01 |
67.35 |
57.9 |
48.78 |
89.31 |
36.25 |
| Gemma 3 27B PT |
65.1 |
22.17 |
88.17 |
65.44 |
87.09 |
73.38 |
93.0 |
83.02 |
78.07 |
45.0 |
84.06 |
79.01 |
91.94 |
75.33 |
35.74 |
61.36 |
49.44 |
17.6 |
82.03 |
25.83 |
| NVIDIA Nemotron Nano 12B v2 Base |
68.6 |
28.7 |
83.59 |
60.58 |
84.83 |
76.09 |
85.0 |
81.42 |
72.93 |
45.8 |
82.81 |
74.35 |
85.35 |
77.9 |
36.58 |
62.02 |
53.13 |
59.15 |
84.08 |
68.28 |
| Model |
Mean Rank ↓ |
Mean Reciprocal Rank ↑ |
| Marin 32B (Bison) |
3.68 |
0.39 |
| Marin 32B (Mantis) |
3.05 |
0.44 |
| OLMo 2 32B Base |
3.89 |
0.34 |
| Qwen 2.5 32B Base |
3.16 |
0.54 |
| Gemma 3 27B PT |
3.37 |
0.39 |
| NVIDIA Nemotron Nano 12B v2 Base |
3.68 |
0.38 |
UPDATE: Retro is up at https://marin.readthedocs.io/en/latest/reports/marin-32b-retro/
Description
We're training a 32B! We are starting from nemotron+starcoder+proofpile since we have strong internal signal and external signal (the Nemotron paper itself) that Nemotron is the best large scale web data.
Links
exp1295_32bData Browser Linkexp1390_32b_necroData Browser Linkexp1380_muon32bData Browser Linkexp1395_qwen3_32bData Browser Linkexp1529_32b_bison_cooldownData Browser Linkexp1529_32b_mantis_cooldownData Browser LinkResults
Seems like a pretty good model!