The pytorch/fairseq team improved the memory effiency of their FP16 optimizer by converting the FP16 parameters to FP32 on the fly instead of keeping a static copy, see facebookresearch/fairseq#404.
Are there any plans to implement this optimization here?
Thanks!
The pytorch/fairseq team improved the memory effiency of their FP16 optimizer by converting the FP16 parameters to FP32 on the fly instead of keeping a static copy, see facebookresearch/fairseq#404.
Are there any plans to implement this optimization here?
Thanks!