Hi,
According to PR #116, we should be able to achieve a 3-4 x speed up for both bert-base and bert-large. However, I can only achieve 2x speed up with bert-base. My docker image uses CUDA9.0 while the discussion in the PR #116 is based on CUDA10.0... I am wondering if that makes the difference....
Thanks
Hi,
According to PR #116, we should be able to achieve a 3-4 x speed up for both bert-base and bert-large. However, I can only achieve 2x speed up with bert-base. My docker image uses CUDA9.0 while the discussion in the PR #116 is based on CUDA10.0... I am wondering if that makes the difference....
Thanks