Skip to content

Marin 32B Base #1295

@dlwh

Description

@dlwh

UPDATE: Retro is up at https://marin.readthedocs.io/en/latest/reports/marin-32b-retro/

Description

We're training a 32B! We are starting from nemotron+starcoder+proofpile since we have strong internal signal and external signal (the Nemotron paper itself) that Nemotron is the best large scale web data.

Links

Results

Seems like a pretty good model!

Model Average AGI Eval LSAT-AR ARC Easy ARC Challenge BoolQ CommonSense QA COPA HellaSwag lambada_openai OpenBookQA PIQA WinoGrande WSC MMLU GPQA BBH MMLU Pro HumanEval GSM8K MATH
Marin 32B (Bison) 63.0 23.4 87.8 65.8 88.9 82.3 94.0 86.6 77.4 46.6 86.1 78.61 82.42 72.9 32.13 55.2 41.9 29.27 54.71 10.35
Marin 32B (Mantis) 65.2 24.8 88.0 65.7 89.4 82.8 93.0 86.9 77.2 46.4 85.9 79.3 79.5 74.7 34.0 59.6 45.1 42.7 69.1 15.3
OLMo 2 32B Base 63.2 22.6 85.9 61.86 83.0 78.6 93.0 85.9 78.3 47.2 83.08 78.85 86.81 71.85 32.21 56.07 42.0 23.78 76.35 12.69
Qwen 2.5 32B Base 68.1 30.43 80.81 55.89 87.65 88.45 87.0 84.11 77.62 44.4 82.4 75.7 80.95 80.83 39.01 67.35 57.9 48.78 89.31 36.25
Gemma 3 27B PT 65.1 22.17 88.17 65.44 87.09 73.38 93.0 83.02 78.07 45.0 84.06 79.01 91.94 75.33 35.74 61.36 49.44 17.6 82.03 25.83
NVIDIA Nemotron Nano 12B v2 Base 68.6 28.7 83.59 60.58 84.83 76.09 85.0 81.42 72.93 45.8 82.81 74.35 85.35 77.9 36.58 62.02 53.13 59.15 84.08 68.28
Model Mean Rank ↓ Mean Reciprocal Rank ↑
Marin 32B (Bison) 3.68 0.39
Marin 32B (Mantis) 3.05 0.44
OLMo 2 32B Base 3.89 0.34
Qwen 2.5 32B Base 3.16 0.54
Gemma 3 27B PT 3.37 0.39
NVIDIA Nemotron Nano 12B v2 Base 3.68 0.38

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions