reduce memory load with quantization

noticed that we are using 32 bit precision for at least some of the model components (maybe the data too?). I was reading about quantization (reducing precision) to make neural net params smaller in memory. https://www.tensorflow.org/performance/quantization

maybe worth considering?