Skip to content

How to visualize attention map for model with multiple heads attention (e.g., vilbert) #917

@CCYChongyanChen

Description

@CCYChongyanChen

❓ Questions and Help

Overall goal: I am trying to extract visual attention map from vilbert to explore where the vilbert is looking at the image.

My question

Question 1:
I know vilbert has three kinds of attention: image attention, text attention, and co-attention. I don't know if I should go with image attention or co-attention. Currently, I go with image attention.
Question 2:
I know for image attention, it outputs 6 vectors, each of the vector has a size (1,8,100,100). I would like to know (1) what does the 8, 100, 100 represent. (2) which vector should I select (3) and how can I visualize attention map with the image attention weights.

My understanding for Question 2:
According to https://github.com/facebookresearch/mmf/blob/3947693aafcc9cc2a16d7c1c5e1479bf0f88ed4b/mmf/configs/models/vilbert/defaults.yaml, it seems that 8 represents the number of attention heads. My guessing is 1 represents the batch size (I changed the batch size to 1), 100 is the image width and height.
If that is correct, then my question 2 becomes "how to deal with multiple attention heads?"

Possible solution for Question2:
I know how to visualize attention map if the attention weights are 1d array or 2d array....For 4d, I am not sure if it makes sense to directly use squeeze() to transform 4d into 2d for visualization. Or I should average multi-heads attention to get 2D attention weights?

Other questions

(1) I am worried about the way they represent the image in transformers makes it impossible to visualize the image attention map for vilbert:
image

(2) I got two image attention weights from Pythia, which one should I use for visualization?

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions