Description
Use the framework we're creating in #646 to predict performance of the tootsie run, mostly as a PoC.
Hypothesis or Goal
Verify that we can predict the performance of our 8b model on a variety of metrics from smaller runs using WSD-S
Metrics of interest:
c4_en/bpb
lm_eval/*/acc_norm
lm_eval/*/bpb (when we add it)
lm_eval/*/forced_choice_bpb (when we add it)
Links
(Delete any that aren't applicable)
Results
(What did you find, including relevant evaluation metrics, etc.)
Description
Use the framework we're creating in #646 to predict performance of the tootsie run, mostly as a PoC.
Hypothesis or Goal
Verify that we can predict the performance of our 8b model on a variety of metrics from smaller runs using WSD-S
Metrics of interest:
c4_en/bpblm_eval/*/acc_normlm_eval/*/bpb(when we add it)lm_eval/*/forced_choice_bpb(when we add it)Links
(Delete any that aren't applicable)
Results
(What did you find, including relevant evaluation metrics, etc.)