Skip to content

Embedding4bit and Embedding8bit implementation#1292

Merged
matthewdouglas merged 6 commits intobitsandbytes-foundation:mainfrom
galqiwi:embedding_quantization
Aug 6, 2024
Merged

Embedding4bit and Embedding8bit implementation#1292
matthewdouglas merged 6 commits intobitsandbytes-foundation:mainfrom
galqiwi:embedding_quantization

Conversation

@galqiwi
Copy link
Copy Markdown
Contributor

@galqiwi galqiwi commented Jul 24, 2024

Hi! I've been researching LLM quantization and found a bottleneck that I think this PR can fix.

When using extreme 1 bit and 2 bit LLM quantization (which have seen many improvements recently 1, 2, 3, 4, 5), uncompressed embeddings can start to take up too much space (in some cases more than 50%).

https://galqiwi.ru/persistent/2024-06-18/embed-1.png

I've documented this bottleneck in huggingface/transformers issue, and it looks like the bitsandbytes library can be a good place to start dealing with it.

In this PR I implement embedding modules for 4-bit and 8-bit quantizations from this library. Currently, they only support _load_from_state_dict API and can't be saved, but I think they still can be useful.

After that, I plan to integrate this functionality into the transformers library by extending HfQuantizer functionality.

What do you think?


There is also one thing I want to implement before going back to the transformers library: support for shared weights in 8-bit quantization.

While 4-bit quantization linear layer does not seem to change self.weight parameter during forward pass, 8-bit quantization linear layer changes it dramatically in init_8bit_state method.

So, while 4-bit embedding and linear layer can share the same Params4bit parameter, it is not the case for 8-bit.

I think that this patch should fix the problem, but this part of the code is very tightly coupled to everything around itself, and I need your help with advice. Do you think it can break something important I don't see?

@galqiwi
Copy link
Copy Markdown
Contributor Author

galqiwi commented Jul 30, 2024

Bump

P.S. Part about shared embeddings can be discussed later in another PR or issue

@galqiwi galqiwi force-pushed the embedding_quantization branch from 67d546e to 35fd05c Compare July 30, 2024 10:53
@matthewdouglas matthewdouglas self-assigned this Jul 30, 2024
@matthewdouglas matthewdouglas self-requested a review July 30, 2024 14:32
@matthewdouglas
Copy link
Copy Markdown
Member

Hi @galqiwi! Thank you for the PR! I think this would be a very useful addition and will review this week. I agree that the shared embeddings can be deferred to follow up discussion/PRs.

@galqiwi galqiwi force-pushed the embedding_quantization branch from 35fd05c to 811aa6c Compare August 5, 2024 14:19
@matthewdouglas matthewdouglas added the Enhancement New feature or request label Aug 5, 2024
Comment thread bitsandbytes/nn/modules.py Outdated
Comment thread bitsandbytes/nn/modules.py Outdated
Comment thread bitsandbytes/nn/modules.py Outdated
@matthewdouglas
Copy link
Copy Markdown
Member

Thanks @galqiwi! Overall, this looks great! I just left a few minor nits, but otherwise happy to merge!

galqiwi and others added 4 commits August 6, 2024 15:42
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
@galqiwi
Copy link
Copy Markdown
Contributor Author

galqiwi commented Aug 6, 2024

Thank you for reviewing my PR, @matthewdouglas! I've fixed all the typos you found

@matthewdouglas
Copy link
Copy Markdown
Member

Thanks @galqiwi! This is a great contribution, and the unit tests here are really appreciated!

@matthewdouglas matthewdouglas merged commit 6d714a5 into bitsandbytes-foundation:main Aug 6, 2024
@galqiwi
Copy link
Copy Markdown
Contributor Author

galqiwi commented Sep 11, 2024

Hi again! Are you planning on publishing new release of bnb?

matthewdouglas added a commit to matthewdouglas/bitsandbytes that referenced this pull request Oct 28, 2024
…on#1292)

* Embedding4bit and Embedding8bit implementation

* lint

* Update bitsandbytes/nn/modules.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update bitsandbytes/nn/modules.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* saving -> Saving

---------

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants