Embedding4bit and Embedding8bit implementation#1292
Merged
matthewdouglas merged 6 commits intobitsandbytes-foundation:mainfrom Aug 6, 2024
Merged
Embedding4bit and Embedding8bit implementation#1292matthewdouglas merged 6 commits intobitsandbytes-foundation:mainfrom
matthewdouglas merged 6 commits intobitsandbytes-foundation:mainfrom
Conversation
ef087bc to
67d546e
Compare
Contributor
Author
|
Bump P.S. Part about shared embeddings can be discussed later in another PR or issue |
67d546e to
35fd05c
Compare
Member
|
Hi @galqiwi! Thank you for the PR! I think this would be a very useful addition and will review this week. I agree that the shared embeddings can be deferred to follow up discussion/PRs. |
35fd05c to
811aa6c
Compare
Member
|
Thanks @galqiwi! Overall, this looks great! I just left a few minor nits, but otherwise happy to merge! |
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Contributor
Author
|
Thank you for reviewing my PR, @matthewdouglas! I've fixed all the typos you found |
Member
|
Thanks @galqiwi! This is a great contribution, and the unit tests here are really appreciated! |
Contributor
Author
|
Hi again! Are you planning on publishing new release of bnb? |
matthewdouglas
added a commit
to matthewdouglas/bitsandbytes
that referenced
this pull request
Oct 28, 2024
…on#1292) * Embedding4bit and Embedding8bit implementation * lint * Update bitsandbytes/nn/modules.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update bitsandbytes/nn/modules.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Update bitsandbytes/nn/modules.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * saving -> Saving --------- Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! I've been researching LLM quantization and found a bottleneck that I think this PR can fix.
When using extreme 1 bit and 2 bit LLM quantization (which have seen many improvements recently 1, 2, 3, 4, 5), uncompressed embeddings can start to take up too much space (in some cases more than 50%).
I've documented this bottleneck in huggingface/transformers issue, and it looks like the bitsandbytes library can be a good place to start dealing with it.
In this PR I implement embedding modules for 4-bit and 8-bit quantizations from this library. Currently, they only support
_load_from_state_dictAPI and can't be saved, but I think they still can be useful.After that, I plan to integrate this functionality into the transformers library by extending HfQuantizer functionality.
What do you think?
There is also one thing I want to implement before going back to the transformers library: support for shared weights in 8-bit quantization.
While 4-bit quantization linear layer does not seem to change
self.weightparameter during forward pass, 8-bit quantization linear layer changes it dramatically ininit_8bit_statemethod.So, while 4-bit embedding and linear layer can share the same
Params4bitparameter, it is not the case for 8-bit.I think that this patch should fix the problem, but this part of the code is very tightly coupled to everything around itself, and I need your help with advice. Do you think it can break something important I don't see?