Scaling laws to predict tootsie performance

## Description

Use the framework we're creating in #646 to predict performance of the tootsie run, mostly as a PoC.

## Hypothesis or Goal

Verify that we can predict the performance of our 8b model on a variety of metrics from smaller runs using WSD-S

Metrics of interest:

* `c4_en/bpb`
* `lm_eval/*/acc_norm`
* `lm_eval/*/bpb` (when we add it)
* `lm_eval/*/forced_choice_bpb` (when we add it)

### Links

(Delete any that aren't applicable)

* WandB Report:  https://wandb.ai/marin-community/marin/reports/654-Scaling-Law--VmlldzoxMDU2MjYwNQ?accessToken=oziscr4jytpwbat2z8erg4z7xapmi7hf82quru1b5qwu5ekklaes79gfr1b8eukk
* WandB Report: https://wandb.ai/marin-community/marin/reports/654-Scaling-Laws-with-soft-metrics--VmlldzoxMDk2ODkxNw?accessToken=5mk1trabr6p2vpn2qxub79wqcsp1q0zk9yrzvvf9e2t8zqid7s9868v104m4amhx
* Experiment JSON: (broken) https://marlin-subtle-barnacle.ngrok-free.app/experiment?path=gs%3A//marin-us-central2/experiments/exp654_scaling_tootsie-ab264d.json
* (etc.)



## Results

(What did you find, including relevant evaluation metrics, etc.)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling laws to predict tootsie performance #654

Description

Hypothesis or Goal

Links

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scaling laws to predict tootsie performance #654

Description

Description

Hypothesis or Goal

Links

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions