fix wrong 'cls' masking for bigbird qa model output#13143
fix wrong 'cls' masking for bigbird qa model output#13143patrickvonplaten merged 1 commit intohuggingface:masterfrom
Conversation
|
Hey @donggyukimc, Thanks for your PR - this makes sense to me. Do you by any chance have a reference to the original code / paper that shows that the original CLS token should not be masked out? Also cc-ing our expert on BigBird here @vasudevgupta7 |
|
@donggyukimc, I am little unsure about this. In the original code also, they are masking out everything till first If we don't mask the Correct me if you feel I am wrong somewhere. |
|
@vasudevgupta7, Thank you for your comment. I bring the QA models from other architectures (BERT, ROBERTA) transformers/src/transformers/models/bert/modeling_bert.py Lines 1831 to 1863 in 91ff480 transformers/src/transformers/models/roberta/modeling_roberta.py Lines 1518 to 1550 in 91ff480 Even though both of them do not apply any mask on predictions for CLS (and also questions), they can be trained without the problems on loss. (actually, CLS shouldn't be masked out because they predict unanswerable probability from CLS) As you can see in, squad_metrics.py, the QA evaluation pipeline in transformers library, transformers/src/transformers/data/metrics/squad_metrics.py Lines 437 to 456 in 91ff480 it directly computes unanswerable probability from same MLP logit outputs with answerable spans. One of your our concerns (there is a possibility that start_token will point to CLS but end_token will point to some token in a sequence and hence final answer will have question also) will be prevented in this part. transformers/src/transformers/data/metrics/squad_metrics.py Lines 453 to 456 in 91ff480 because the positions of questions tokens not exists in feature.token_to_orig_map. Your suggestion using a separate MLP to predict unanswerable probability will also do the work, but you have to use different evaluation code except for squad_metrics.py. Actually, this is how i found the problem, i got wrong prediction results when i used bigbirdQA model + squad_metrics.py In my opinion, it is better to use the same prediction mechanism in order to keep compatibility between other QA model architectures and the QA evaluation pipeline in transformers library. I'd like to hear your opinion on this. Thank you for your thoughtful comment again, @vasudevgupta7. |
|
any thoughts on my opinion? @patrickvonplaten @vasudevgupta7 |
|
Hey @donggyukimc, so sorry I missed your comment earlier. As you pointed out about BERT like models, I think it's fine to unmask |
|
Awesome merging it then! |
What does this PR do?
Currently, the bigbird QA model masks out (assign very small value < -1e6) all logits before context tokens as follows.
As you can see, it also masks out the logits from [CLS] token. This is because the following function makes question masks based on the position of the first [SEP] token.
transformers/src/transformers/models/big_bird/modeling_big_bird.py
Line 3047 in 14e9d29
However, this is the wrong mechanism because [CLS] token is used for the prediction of "unanswerable question" in many QA models.
So, I simply change the code so that the masking on [CLS] token is disabled right after the creation of token_type_ids.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.