Hi, thanks for publishing the paper and sharing the source code.
I found that the "attn_output" is not used after definition.
When learning roberta for parameter efficient learning, the paper version of prefix tuning does not seem to work properly.
Could you please check it out?!
|
if self.config.attn_mode != "none" and self.config.attn_composition == "gate_add": |
|
attn_output = context_layer * w_attn + cross_attn_output * w_prefix |
Thanks!
Hi, thanks for publishing the paper and sharing the source code.
I found that the "attn_output" is not used after definition.
When learning roberta for parameter efficient learning, the paper version of prefix tuning does not seem to work properly.
Could you please check it out?!
unify-parameter-efficient-tuning/src/transformers/models/roberta/modeling_roberta.py
Lines 391 to 392 in 3222ce2
Thanks!