Currently the attention example is not very efficient, particularly on GPUs. For example, this for loop could be changed so it only does a single matrix multiplication (which can be done only once per sentence):
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L75
Also, here the attention model is randomly generating examples instead of selecting the best one, which is more in line with what we would expect:
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L105
Currently the attention example is not very efficient, particularly on GPUs. For example, this for loop could be changed so it only does a single matrix multiplication (which can be done only once per sentence):
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L75
Also, here the attention model is randomly generating examples instead of selecting the best one, which is more in line with what we would expect:
https://github.com/clab/dynet/blob/master/examples/python/attention.py#L105