Skip to content

Add DuoAttention on the fly#63

Merged
SimJeg merged 7 commits intomainfrom
simon/duo-attention-on-the-fly
Mar 19, 2025
Merged

Add DuoAttention on the fly#63
SimJeg merged 7 commits intomainfrom
simon/duo-attention-on-the-fly

Conversation

@SimJeg
Copy link
Copy Markdown
Collaborator

@SimJeg SimJeg commented Mar 19, 2025

This PR fixes #62 and propose a new way to compute DuoAttention scores for classification of streaming and retrieval heads. I will report results based on this PR in this branch.

@SimJeg SimJeg marked this pull request as draft March 19, 2025 11:53
SimJeg added 2 commits March 19, 2025 12:40
Signed-off-by: SimJeg <[email protected]>
@SimJeg SimJeg force-pushed the simon/duo-attention-on-the-fly branch from 8152e87 to c844335 Compare March 19, 2025 12:40
SimJeg added 3 commits March 19, 2025 13:03
Signed-off-by: SimJeg <[email protected]>
Signed-off-by: SimJeg <[email protected]>
@SimJeg SimJeg marked this pull request as ready for review March 19, 2025 13:44
@SimJeg
Copy link
Copy Markdown
Collaborator Author

SimJeg commented Mar 19, 2025

Here are the results: not only the new method is faster (from "several hours on an 8 GPU node" to a few seconds on 1 GPU), but it's also better ! The on-the-fly technique requires 30sec with 50 samples but I obtained similar results using only 10 samples.

image

Copy link
Copy Markdown
Collaborator

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@SimJeg
Copy link
Copy Markdown
Collaborator Author

SimJeg commented Mar 19, 2025

Additional results with 5 samples instead of 50 to compute head scores on the fly:
image

@SimJeg SimJeg merged commit 56d31a1 into main Mar 19, 2025
3 checks passed
@SimJeg SimJeg deleted the simon/duo-attention-on-the-fly branch March 19, 2025 15:59
giulio98 pushed a commit to miriam-16/kvpress that referenced this pull request Apr 4, 2025
maxjeblick pushed a commit that referenced this pull request Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accelerate DuoAttentionPress and QFilterPress

2 participants