Skip to content

Integrate Q-Filters#54

Merged
SimJeg merged 13 commits intoNVIDIA:mainfrom
alessiodevoto:qfilters
Mar 17, 2025
Merged

Integrate Q-Filters#54
SimJeg merged 13 commits intoNVIDIA:mainfrom
alessiodevoto:qfilters

Conversation

@NathanGodey
Copy link
Copy Markdown
Contributor

PR description

Description of your PR. Fixes # (issue) (if applicable)

New press checklist (if applicable)

  • I added mypress_press.py in the presses directory
  • I added MyPress in __init__.py
  • I updated the README.md with a 1 liner about my new press in the Available presses section
  • I added my press in the default_presses list in tests/default_presses.py

@NathanGodey NathanGodey changed the title qfilters_press Integrate Q-Filters Mar 3, 2025
@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 4, 2025

Hi, thanks for contributing to kvpress ! Could you resolve conflicts with the main branch so that I can run the CI/CD ? I will add a few comments. Please don't forget to sign off your commits using git commit -s

@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 10, 2025

@NathanGodey in addition to comments above, you'll need to sign off your past commits for the DCO:

To add your Signed-off-by line to every commit in this branch:
Ensure you have a local copy of your branch by checking out the pull request locally via command line.
In your local branch, run: git rebase HEAD~2 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin qfilters

@SimJeg SimJeg self-assigned this Mar 10, 2025
SimJeg and others added 11 commits March 10, 2025 19:50
Signed-off-by: Nathan <[email protected]>
* Add DuoAttentionPress

* Fix tests and compression_ratio

* Address feedback

* Update plot

* Update version

Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
* Add DuoAttentionPress

* Fix tests and compression_ratio

* Address feedback

* Update plot

* Update version

Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
@NathanGodey
Copy link
Copy Markdown
Contributor Author

I'm terrible at git so the rebasing was a bit messy... It should work now, let me know if I can help on anything :)

@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 11, 2025

@NathanGodey my review was pending since last week, I forgot to publish it 🤦
The DCO now works thank you, don't forget to sign-off your next commits !

@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 12, 2025

@NathanGodey thanks for all the updates ! I asked for 2 3 4 minor updates, once it's done we're ready to merge, congrats 👏

@SimJeg SimJeg mentioned this pull request Mar 13, 2025
@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 17, 2025

@NathanGodey could you push the 4 updates I mentioned above ?

For info I ran an experiment where I saved the mean and covariance for ExpectedAttention using 100 samples from the BookSum dataset. For fairness in comparison I also removed the value norm rescaling. With 50% compression on RULER I get:

  • Qfilters: 62.82
  • Expected Attention with on the fly statistics: 73.41
  • Expected Attention with saved statistics: 76.40

So saving statistics makes it both better and faster !

@NathanGodey
Copy link
Copy Markdown
Contributor Author

That sounds great! It seems to confirm that this "unimodal" filtering is inherent to the model and not particularly context-dependent.

Now, I'm wondering what explains the gap between Q-Filters and pre-computed ExpectedAttention, because they start looking more and more the same (maybe covariance? handling of positional encoding? forced sink tokens?); I would also like to see if the saved stats help for larger compression ratios, where Q-Filters seems more effective compared to ExpectedAttention. I'll run the experiments when/if I have some time, let me know if you want to stay in the loop (maybe elsewhere :) ).

@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 17, 2025

what explains the gap between Q-Filters and pre-computed ExpectedAttention

I would say:

  • apply future RoPE rotation to saved query: I get 79.75 -> 74.56 if I remove this rotation (with saved statistics + vnorm)
  • better estimation of E[exp(<Q, K>)], partly because of covariance (with mean only I get 71.86)

@SimJeg
Copy link
Copy Markdown
Collaborator

SimJeg commented Mar 17, 2025

@NathanGodey you still have 1 comment to address (not sure you received notification in the thread)

@SimJeg SimJeg merged commit 4100647 into NVIDIA:main Mar 17, 2025
3 checks passed
giulio98 pushed a commit to miriam-16/kvpress that referenced this pull request Apr 4, 2025
maxjeblick pushed a commit that referenced this pull request Aug 12, 2025
Signed-off-by: Max Jeblick <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants