Skip to content

Update for new version of HF transformers.#104

Closed
manueldeprada wants to merge 3 commits intoNVIDIA:mainfrom
manueldeprada:patch-1
Closed

Update for new version of HF transformers.#104
manueldeprada wants to merge 3 commits intoNVIDIA:mainfrom
manueldeprada:patch-1

Conversation

@manueldeprada
Copy link
Copy Markdown

We've recently merged a layer-wise refactor of the cache system in Transformers: huggingface/transformers#39106.

While testing your repo for compatibility, I had to adapt parts of the code to the new interface. To help with the migration, I've included my changes below. These are not intended as a full PR (I've only tested a small subset) but they should serve as a helpful guide.

Some updates are deprecations (e.g., cache.key_cache[i] is still supported via a backward-compatibility layer, though cache.layers[i].keys is preferred). However, there are also breaking changes, particularly in private attributes: for example, cache._quantized_key_cache is now cache.cache_processor._quantized_keys.

I also encountered some CUDA illegal memory access errors, which I suspect are related to: huggingface/transformers#39474 and contiguous memory requirements in FlashAttention v2.

In short, the upcoming Transformers release introduces necessary but potentially breaking changes that may impact this repo. I recommend testing against the main branch, and I'm happy to help if further issues come up.

@maxjeblick
Copy link
Copy Markdown
Collaborator

Thanks a lot for opening this PR, we really appreciate this proactive engagement!

We merged KvZipPress; this press would also require some updates. Would be great if you could update this PR.

Regarding the next steps:

@maxjeblick
Copy link
Copy Markdown
Collaborator

Hi @manueldeprada
As you may have noticed, the new refactoring of the attention implementation in transformers, alongside with some other changes, currently breaks kvpress.

As this is a larger topic, the maintainers of this repo are currently working on a fix for this.

@manueldeprada
Copy link
Copy Markdown
Author

Great! Please make sure to clone the main branch, we recently merged a further simplification of KV caches!:
huggingface/transformers#39797

Hopefully this is the final stable interface!!

This PR should provide enough inspiration to quickly adapt KVPress. Ping me if there are further pain points!

@alessiodevoto
Copy link
Copy Markdown
Collaborator

Closing this as we merged #115 (after updates from transformers side). Thanks again @manueldeprada for pointing this out 🙂 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants