Integrate Q-Filters by NathanGodey · Pull Request #54 · NVIDIA/kvpress

NathanGodey · 2025-03-03T17:28:12Z

PR description

Description of your PR. Fixes # (issue) (if applicable)

New press checklist (if applicable)

I added mypress_press.py in the presses directory
I added MyPress in __init__.py
I updated the README.md with a 1 liner about my new press in the Available presses section
I added my press in the default_presses list in tests/default_presses.py

SimJeg · 2025-03-04T07:29:32Z

Hi, thanks for contributing to kvpress ! Could you resolve conflicts with the main branch so that I can run the CI/CD ? I will add a few comments. Please don't forget to sign off your commits using git commit -s

SimJeg · 2025-03-10T17:39:30Z

@NathanGodey in addition to comments above, you'll need to sign off your past commits for the DCO:

To add your Signed-off-by line to every commit in this branch:
Ensure you have a local copy of your branch by checking out the pull request locally via command line.
In your local branch, run: git rebase HEAD~2 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin qfilters

Signed-off-by: Nathan <[email protected]>

* Add DuoAttentionPress * Fix tests and compression_ratio * Address feedback * Update plot * Update version Signed-off-by: Nathan <[email protected]>

Signed-off-by: Nathan <[email protected]>

* Add DuoAttentionPress * Fix tests and compression_ratio * Address feedback * Update plot * Update version Signed-off-by: Nathan <[email protected]>

Signed-off-by: Nathan <[email protected]>

NathanGodey · 2025-03-10T18:53:38Z

I'm terrible at git so the rebasing was a bit messy... It should work now, let me know if I can help on anything :)

README.md

kvpress/presses/qfilter_press.py

tests/default_presses.py

kvpress/__init__.py

kvpress/presses/duo_attention_press.py

SimJeg · 2025-03-11T07:55:32Z

@NathanGodey my review was pending since last week, I forgot to publish it 🤦
The DCO now works thank you, don't forget to sign-off your next commits !

Signed-off-by: Nathan <[email protected]>

kvpress/presses/duo_attention_press.py

README.md

SimJeg · 2025-03-12T14:15:02Z

@NathanGodey thanks for all the updates ! I asked for 2 3 4 minor updates, once it's done we're ready to merge, congrats 👏

kvpress/presses/qfilter_press.py

SimJeg · 2025-03-17T07:58:52Z

@NathanGodey could you push the 4 updates I mentioned above ?

For info I ran an experiment where I saved the mean and covariance for ExpectedAttention using 100 samples from the BookSum dataset. For fairness in comparison I also removed the value norm rescaling. With 50% compression on RULER I get:

Qfilters: 62.82
Expected Attention with on the fly statistics: 73.41
Expected Attention with saved statistics: 76.40

So saving statistics makes it both better and faster !

Signed-off-by: Nathan <[email protected]>

NathanGodey · 2025-03-17T09:49:08Z

That sounds great! It seems to confirm that this "unimodal" filtering is inherent to the model and not particularly context-dependent.

Now, I'm wondering what explains the gap between Q-Filters and pre-computed ExpectedAttention, because they start looking more and more the same (maybe covariance? handling of positional encoding? forced sink tokens?); I would also like to see if the saved stats help for larger compression ratios, where Q-Filters seems more effective compared to ExpectedAttention. I'll run the experiments when/if I have some time, let me know if you want to stay in the loop (maybe elsewhere :) ).

SimJeg · 2025-03-17T09:57:11Z

what explains the gap between Q-Filters and pre-computed ExpectedAttention

I would say:

apply future RoPE rotation to saved query: I get 79.75 -> 74.56 if I remove this rotation (with saved statistics + vnorm)
better estimation of E[exp(<Q, K>)], partly because of covariance (with mean only I get 71.86)

SimJeg · 2025-03-17T10:36:12Z

@NathanGodey you still have 1 comment to address (not sure you received notification in the thread)

Signed-off-by: giulio98 <[email protected]>

Signed-off-by: Max Jeblick <[email protected]>

NathanGodey changed the title ~~qfilters_press~~ Integrate Q-Filters Mar 3, 2025

SimJeg self-assigned this Mar 10, 2025

SimJeg and others added 11 commits March 10, 2025 19:50

Add epsilon to ExpectedAttentionPress (#47)

52b3541

Signed-off-by: Nathan <[email protected]>

Fix distributed inference (#49)

7180135

Signed-off-by: Nathan <[email protected]>

qfilters_press

8565cbb

Signed-off-by: Nathan <[email protected]>

Add DuoAttentionPress (#50)

9e0f6b0

* Add DuoAttentionPress * Fix tests and compression_ratio * Address feedback * Update plot * Update version Signed-off-by: Nathan <[email protected]>

add ChunkKV

e25b7a6

Signed-off-by: Nathan <[email protected]>

fix style

48270fa

Signed-off-by: Nathan <[email protected]>

qfilters_press

dbfc374

Signed-off-by: Nathan <[email protected]>

Add DuoAttentionPress (#50)

8fc790e

* Add DuoAttentionPress * Fix tests and compression_ratio * Address feedback * Update plot * Update version Signed-off-by: Nathan <[email protected]>

add ChunkKV

850e107

Signed-off-by: Nathan <[email protected]>

fix style

4888dca

Signed-off-by: Nathan <[email protected]>

Merge branch 'main' into qfilters

66a0593

SimJeg reviewed Mar 11, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

kvpress/presses/qfilter_press.py Show resolved Hide resolved

tests/default_presses.py Show resolved Hide resolved

kvpress/__init__.py Outdated Show resolved Hide resolved

kvpress/presses/duo_attention_press.py Outdated Show resolved Hide resolved

implement discussion suggestions

3bb4077

Signed-off-by: Nathan <[email protected]>

SimJeg reviewed Mar 12, 2025

View reviewed changes

kvpress/presses/duo_attention_press.py Outdated Show resolved Hide resolved

SimJeg reviewed Mar 12, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

SimJeg mentioned this pull request Mar 13, 2025

Update copyright #60

Merged

SimJeg reviewed Mar 13, 2025

View reviewed changes

kvpress/presses/qfilter_press.py Outdated Show resolved Hide resolved

SimJeg reviewed Mar 14, 2025

View reviewed changes

kvpress/presses/qfilter_press.py Outdated Show resolved Hide resolved

copyright + readme + final fixes

6c6bc0f

Signed-off-by: Nathan <[email protected]>

SimJeg approved these changes Mar 17, 2025

View reviewed changes

SimJeg merged commit 4100647 into NVIDIA:main Mar 17, 2025
3 checks passed

giulio98 pushed a commit to miriam-16/kvpress that referenced this pull request Apr 4, 2025

Add QFilterPress (NVIDIA#54)

e71f4e9

Signed-off-by: giulio98 <[email protected]>

maxjeblick pushed a commit that referenced this pull request Aug 12, 2025

Add QFilterPress (#54)

c578e4a

Signed-off-by: Max Jeblick <[email protected]>

Conversation

NathanGodey commented Mar 3, 2025

PR description

New press checklist (if applicable)

Uh oh!

SimJeg commented Mar 4, 2025

Uh oh!

SimJeg commented Mar 10, 2025

Uh oh!

NathanGodey commented Mar 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimJeg commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

SimJeg commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimJeg commented Mar 17, 2025

Uh oh!

NathanGodey commented Mar 17, 2025

Uh oh!

SimJeg commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimJeg commented Mar 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SimJeg commented Mar 12, 2025 •

edited

Loading

SimJeg commented Mar 17, 2025 •

edited

Loading