Make checkpoint saving folder clear in the config#444
Make checkpoint saving folder clear in the config#444DNXie merged 1 commit intometa-pytorch:mainfrom
folder clear in the config#444Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #444 +/- ##
==========================================
- Coverage 64.69% 64.63% -0.06%
==========================================
Files 79 79
Lines 7775 7788 +13
==========================================
+ Hits 5030 5034 +4
- Misses 2745 2754 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| enable: true | ||
| initial_load_path: hf://${model} | ||
| initial_load_in_hf: true | ||
| folder: ./checkpoint # The folder to save checkpoints to. |
There was a problem hiding this comment.
I think we control these config fields and we should be opinionated on exposing RL friendly config field names and re-map it to TorchTitan fields internally.
Right now, some TorchTitan config names are really confusing: e.g. the ref model logically should not need checkpointing but still it requires checkpoint.enable = true.
There was a problem hiding this comment.
Yeah was discussing this with @joecummings too.. ultimately we probably want to provide some kind of internal config mapping that we execute as a kind of training script post-init before we do any of the actual setup of actors (e.g. we can even bake it into our config.parse decorator).
Made it clear in the config where to specify the checkpoint saving folder. And added some necessary comments to clarify the behavior.