[EPIC] Big Tootsies

We have been running some larger jobs using preemptible v6e compute. I will create issues for them individually and link there.

- 13B #860 
- 24B #861 
- 70B #750 

# Overall Setup

The 13B and 24B have been handled roughly identically so updates will be pretty similar. The 70B is a bit different, but same idea.

## Compute

The big challenge with these runs is that have been run entirely on preemptible compute: K x v6e-{128,256} for some value of K for the 24B and 70B, and either K X v6e-{64,128} for the 13b. We change K based on the current availability using a currently very manual process. These changes have been driven entirely by availability.

When we changed the cluster size we did not change the batch size.

## Hypers

All models started with a BS of 4M tokens, but we switched the 24 and 13B to 12M and the 70B to 6M tokens in the middle. These were good improvements. (6M for the 70B fit better in the HBM we typically had, though we should explore CPU offload...)

All of these have the same phase 1 mix as #600. 13B and 24B were trained with WSD-S for a long time, then we switched to WSD with EMA and a larger batch size. Data is identical in all runs. (It ought to be the same samples in the same order until we changed batch sizes.)

LRs were set heuristically... generally scaling something like $`lr_{base}  \cdot \sqrt{bs} / \sqrt{width}`$ but I definitely deviated from that with vibes.


# Links

* [WandB Report](https://wandb.ai/marin-community/marin/reports/Big-Tootsies--VmlldzoxMTEyOTQ0MA?accessToken=st1rajwy32etqi5rrlm3kuhgqa4ods6fnwsbyk8azjc8ar3eikf4dnz1p2ldz8yx )

# Results

TODO



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] Big Tootsies #859

Overall Setup

Compute

Hypers

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC] Big Tootsies #859

Description

Overall Setup

Compute

Hypers

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions