Tootsie 24B

## Description

(Sometimes I call this a 22B but it's actually 24B)

See #859 for overall description Brief summary

- Normal Llama 3 architecture, Llama 3 tokenizer
- Preemptible Compute v6e, using multislice
- Tootsie Phase 1 Mix (DCLM+Starcoder+Proofpile)
- started out with WSD-S but switched to WSD and EMA mid-run
- Also increased BS midrun 4M->12M tokens

## Hypothesis or Goal

Train an awesome pretty-big model, make sure scaling laws predict it.


## Links

* [WandB Report](https://wandb.ai/marin-community/marin/reports/Big-Tootsies--VmlldzoxMTEyOTQ0MA)
* Data Browser: (link)
* Experiment JSON: (link)
* (etc.)



## Results

(What did you find, including relevant evaluation metrics, etc.)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tootsie 24B #861

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tootsie 24B #861

Description

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions