Experiment: not-quite-so-deep cooldown

## Description

Codename: hypnotic-spoonbill

So #898 was moderately successful: cooling the model down more resulted in better SFT performance. However we hit an LR floor below which the loss increased. various attempts to save it didn't work,.

We're going to try another cooldown that isn't quite so deep as deep-raccoon. I’m gonna treat 3e-5 as an LR floor and repeat “soft raccoon” with a decay to 3e-5 over 200B tokens with ~.3% tulu 3 and ~1% flan. We're hoping that adding some SFT-ish data while we're cooling down will make the model want to be task-y.

(tulu 3 is ~600M tokens, so this will be about an epoch)

## Hypothesis or Goal

Get a model that makes AlpacaEval go up.


### Links

(Delete any that aren't applicable)

* WandB Report:  [(link)](https://wandb.ai/marin-community/marin/reports/916-Tootsie-Hypnotic-Spoonbill--VmlldzoxMjA1NjU2Nw)
* Experiment JSON: [(link)](https://marin.community/data-browser/experiment/?path=gs%3A//marin-us-central2/experiments/exp916_tootsie_spoonbill_cooldown-9f5976.json)
* (etc.)


## Results

Shockingly, the same freaking thing happened in spoonbill at about he same step, despite the higher LR. We are going to start logging norms of the params, optimizer states, grads, etc to see if we can see something weird.

![Image](https://github.com/user-attachments/assets/54725fe4-8b9d-4573-a1e3-00583840ded4)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: not-quite-so-deep cooldown #916

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment: not-quite-so-deep cooldown #916

Description

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions