Skip to content

Experiment: not-quite-so-deep cooldown #916

@dlwh

Description

@dlwh

Description

Codename: hypnotic-spoonbill

So #898 was moderately successful: cooling the model down more resulted in better SFT performance. However we hit an LR floor below which the loss increased. various attempts to save it didn't work,.

We're going to try another cooldown that isn't quite so deep as deep-raccoon. I’m gonna treat 3e-5 as an LR floor and repeat “soft raccoon” with a decay to 3e-5 over 200B tokens with ~.3% tulu 3 and ~1% flan. We're hoping that adding some SFT-ish data while we're cooling down will make the model want to be task-y.

(tulu 3 is ~600M tokens, so this will be about an epoch)

Hypothesis or Goal

Get a model that makes AlpacaEval go up.

Links

(Delete any that aren't applicable)

Results

Shockingly, the same freaking thing happened in spoonbill at about he same step, despite the higher LR. We are going to start logging norms of the params, optimizer states, grads, etc to see if we can see something weird.

Image

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions