[feat] Model version control using W&B Artifacts#1137
[feat] Model version control using W&B Artifacts#1137ayulockin wants to merge 19 commits intofacebookresearch:mainfrom
Conversation
…rgs (#1) ability to log config file, initialize wandb with kwargs and pass entity argument for teams account.
|
Hey @ebsmothers, thought of tagging you here for visibility since you looked over my first PR. |
|
@ayulockin Thanks for the PR! Give us a few days to review this. We will get back to you soon. |
ebsmothers
left a comment
There was a problem hiding this comment.
Thanks for the PR, and for your patience on the review. The changes look good. Can you rebase to factor out the changes from PR#1129? Alternatively we can just close the other PR and use this one instead, whichever you prefer.
|
Hey, @ebsmothers I rebased to factor in the changes. This PR now contains all the changes from PR#1129. Please take a look and let me know. If you want you can close the PR#1129. |
|
@ebsmothers has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@ayulockin has updated the pull request. You must reimport the pull request before landing. |
|
@ebsmothers has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: 🚀 I have extended the `WandbLogger` with the ability to log the `current.pt` checkpoint as W&B Artifacts. Note that this PR is based on top of this [PR](#1129). ### What is W&B Artifacts? > W&B Artifacts was designed to make it effortless to version your datasets and models, regardless of whether you want to store your files with us or whether you already have a bucket you want us to track. Once you've tracked your dataset or model files, W&B will automatically log each and every modification, giving you a complete and auditable history of changes to your files. Through this PR, W&B Artifacts can help save and organize machine learning models throughout a project's lifecycle. More details in the documentation [here](https://docs.wandb.ai/guides/artifacts/model-versioning). ### Modification This PR adds a `log_model_checkpoint` method to the `WandbLogger` class in the `utils/logger.py` file. This method is called in the `utils/checkpoint.py` file. ### Usage To use this, in the `config/defaults.yaml` do, `training.wandb.enabled=true` and `training.wandb.log_checkpoint=true`. ### Result The screenshot shows the `current.pt` checkpoints saved at intervals defined by `training.checkpoint_interval`. You can check out the logged artifacts page [here](https://wandb.ai/ayut/mmf/artifacts/model/run_ey9xextf_model/0dc64164acbdc300fd01/api).  ### Superpowers With this small addition, now one can easily track different versions of the model, download a checkpoint of interest by using the API in the API tab, easily share the checkpoints with teammates, etc. ### Requests This is a draft PR as there are a few more things that can be improved here. * Is there a better way to access the path to the `current.pt` checkpoint? Rather is the modification made to `utils/checkpoint.py` an acceptable way of approaching this? * While logging a file as W&B artifacts we can also provide metadata associated with that file. In this case, we can add current iteration, training metrics, etc. as the metadata. Would love to get suggestions about the different data points that I should log as metadata alongside the checkpoints. * How to determine if a checkpoint is the best one? If a checkpoint is best I can add `best` as an alias for that checkpoint's artifact. Pull Request resolved: #1137 Test Plan: Imported from GitHub, without a `Test Plan:` line. **Static Docs Preview: mmf** |[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D32402090/V6/mmf/)| |**Modified Pages**| |[docs/notes/logger](https://our.intern.facebook.com/intern/staticdocs/eph/D32402090/V6/mmf/docs/notes/logger/)| Reviewed By: apsdehal Differential Revision: D32402090 Pulled By: ebsmothers fbshipit-source-id: 94b881ec55c4197301331d571bc926521e2feecc
🚀 I have extended the
WandbLoggerwith the ability to log thecurrent.ptcheckpoint as W&B Artifacts. Note that this PR is based on top of this PR.What is W&B Artifacts?
Through this PR, W&B Artifacts can help save and organize machine learning models throughout a project's lifecycle. More details in the documentation here.
Modification
This PR adds a
log_model_checkpointmethod to theWandbLoggerclass in theutils/logger.pyfile. This method is called in theutils/checkpoint.pyfile.Usage
To use this, in the
config/defaults.yamldo,training.wandb.enabled=trueandtraining.wandb.log_checkpoint=true.Result
The screenshot shows the
current.ptcheckpoints saved at intervals defined bytraining.checkpoint_interval. You can check out the logged artifacts page here.Superpowers
With this small addition, now one can easily track different versions of the model, download a checkpoint of interest by using the API in the API tab, easily share the checkpoints with teammates, etc.
Requests
This is a draft PR as there are a few more things that can be improved here.
Is there a better way to access the path to the
current.ptcheckpoint? Rather is the modification made toutils/checkpoint.pyan acceptable way of approaching this?While logging a file as W&B artifacts we can also provide metadata associated with that file. In this case, we can add current iteration, training metrics, etc. as the metadata. Would love to get suggestions about the different data points that I should log as metadata alongside the checkpoints.
How to determine if a checkpoint is the best one? If a checkpoint is best I can add
bestas an alias for that checkpoint's artifact.