Description
Support MuP in Levanter so that it's easier to do scaling laws analysis.
Reference:
MuP https://arxiv.org/pdf/2203.03466
WandB Link: https://wandb.ai/marin-community/marin/reports/621-MuP--VmlldzoxMTIxMTUzNQ?accessToken=h5qjejzau65v7bab5bau94hu8cltm1q9a0v6tdabocd398wagpnr6rjk4u8yc41a
Definition of Done
Implementation of MuP in Levanter merged. Could open another issue for experiments in marin once this is done.
Description
Support MuP in Levanter so that it's easier to do scaling laws analysis.
Reference:
MuP https://arxiv.org/pdf/2203.03466
WandB Link: https://wandb.ai/marin-community/marin/reports/621-MuP--VmlldzoxMTIxMTUzNQ?accessToken=h5qjejzau65v7bab5bau94hu8cltm1q9a0v6tdabocd398wagpnr6rjk4u8yc41a
Definition of Done
Implementation of MuP in Levanter merged. Could open another issue for experiments in marin once this is done.