The instantiation of Multi-head PA and the design choice of MAM adapter.

Thanks for your great work!
I have read your paper, but I am a bit confused about two things.

(1) The instantiation of Multi-head PA. How can we instantiate Multi-head PA (r=30) to make it have the same quantity of tuned parameters as PA (attn, r=30) according to Table 4 in the main paper? My initial thought is that Multi-head PA's tuned parameters will be N_h times those of PA.

(2) The design choice of MAM adapter. According to my understanding, MH PA (attn, r = 30) is slightly better than prefix tuning (l = 30) based on the result in Table 4 (35.3>35.2), and according to previous papers like LoRA, prefix tuning is not stable to optimize. However, MAM adopts prefix tuning. Is there a specific reason for this?

Would you mind giving me any clues about these two questions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The instantiation of Multi-head PA and the design choice of MAM adapter. #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The instantiation of Multi-head PA and the design choice of MAM adapter. #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions