Skip to content

Tootsie 24B #861

@dlwh

Description

@dlwh

Description

(Sometimes I call this a 22B but it's actually 24B)

See #859 for overall description Brief summary

  • Normal Llama 3 architecture, Llama 3 tokenizer
  • Preemptible Compute v6e, using multislice
  • Tootsie Phase 1 Mix (DCLM+Starcoder+Proofpile)
  • started out with WSD-S but switched to WSD and EMA mid-run
  • Also increased BS midrun 4M->12M tokens

Hypothesis or Goal

Train an awesome pretty-big model, make sure scaling laws predict it.

Links

  • WandB Report
  • Data Browser: (link)
  • Experiment JSON: (link)
  • (etc.)

Results

(What did you find, including relevant evaluation metrics, etc.)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions