Hi @monologg! Just a theoretical question about what the BERT for Joint Intent Classification and Slot Filling publication says here:
The learning objective is to maximize the conditional probability p(y^i, y^s|x). The model is finetuned end-to-end via minimizing the cross-entropy loss.
If I understand correctly, this is not to sum the intent and slot losses as you have in your models (total_loss = intent_loss + self.args.slot_loss_coef * slot_loss). If that part of the paper is correct, you should first multiply the probabilities calculated from both logits and then use the CrossEntropyLoss over these probabilities.
Hi @monologg! Just a theoretical question about what the BERT for Joint Intent Classification and Slot Filling publication says here:
If I understand correctly, this is not to sum the intent and slot losses as you have in your models (
total_loss = intent_loss + self.args.slot_loss_coef * slot_loss). If that part of the paper is correct, you should first multiply the probabilities calculated from both logits and then use the CrossEntropyLoss over these probabilities.