Skip to content

Tootsie 70b #750

@dlwh

Description

@dlwh

Description

This model is trained from scratch with WSD + EMA, but otherwise starts the same as the other tootsies (#859 #600)

Still using DCLM+code+math for now.

Hypothesis or Goal

Just make a real freaking good model.

Links

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions