Integrate Q-Filters#54
Conversation
|
Hi, thanks for contributing to kvpress ! Could you resolve conflicts with the main branch so that I can run the CI/CD ? I will add a few comments. Please don't forget to sign off your commits using |
|
@NathanGodey in addition to comments above, you'll need to sign off your past commits for the DCO:
|
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
* Add DuoAttentionPress * Fix tests and compression_ratio * Address feedback * Update plot * Update version Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
* Add DuoAttentionPress * Fix tests and compression_ratio * Address feedback * Update plot * Update version Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
Signed-off-by: Nathan <[email protected]>
|
I'm terrible at git so the rebasing was a bit messy... It should work now, let me know if I can help on anything :) |
|
@NathanGodey my review was pending since last week, I forgot to publish it 🤦 |
Signed-off-by: Nathan <[email protected]>
|
@NathanGodey thanks for all the updates ! I asked for |
|
@NathanGodey could you push the 4 updates I mentioned above ? For info I ran an experiment where I saved the mean and covariance for ExpectedAttention using 100 samples from the BookSum dataset. For fairness in comparison I also removed the value norm rescaling. With 50% compression on RULER I get:
So saving statistics makes it both better and faster ! |
Signed-off-by: Nathan <[email protected]>
|
That sounds great! It seems to confirm that this "unimodal" filtering is inherent to the model and not particularly context-dependent. Now, I'm wondering what explains the gap between Q-Filters and pre-computed ExpectedAttention, because they start looking more and more the same (maybe covariance? handling of positional encoding? forced sink tokens?); I would also like to see if the saved stats help for larger compression ratios, where Q-Filters seems more effective compared to ExpectedAttention. I'll run the experiments when/if I have some time, let me know if you want to stay in the loop (maybe elsewhere :) ). |
I would say:
|
|
@NathanGodey you still have 1 comment to address (not sure you received notification in the thread) |
Signed-off-by: giulio98 <[email protected]>
Signed-off-by: Max Jeblick <[email protected]>
PR description
Description of your PR. Fixes # (issue) (if applicable)
New press checklist (if applicable)
mypress_press.pyin thepressesdirectoryMyPressin__init__.pyREADME.mdwith a 1 liner about my new press in the Available presses sectiondefault_presseslist intests/default_presses.py