Add docs on `QuantState` in `functional`

### Feature request

This is just a request for a few lines of documentation.

I'd like to use BnB for 4-bit quantization, related to KV caching. Just that, nothing fancy on top like NN ops with quantization inside. I found the entries in `QuantState` not being documented, and some non-obvious things seem to happen for `quantize_4bit`.

Say I call `quantize_4bit` with `x` of shape `(a, b, c, blocksize)`. Call `num_ch = a * b * c`. I am getting `qx`, `state`, so that:

* `state.absmax.shape = (num_ch,)`
* `qx.shape = (num_ch * blocksize // 2,)`, `qx.dtype = torch.uint8`

For my application, I need to be able to quantize and dequantize slices of the full tensor and write them back.

My best guess was the memory layout of `qx` and `state.absmax` allows me to do `qx.view(a, b, c, blocksize // 2)` and `state.absmax.view(a, b, c)` and then work with that. But this does not seem to work.

```python
# x.shape = (4, 3, 38, 256)
qx, state = quantize_4bit(x, blocksize=256)
start, end = 10, 15
partx = x[:, :, start:end, :]
qx_part, state_part = quantize_4bit(partx, blocksize=256)
full = state.absmax.view(4, 3, 38)
part = state_part.absmax(4, 3, 5)
torch.testing.assert_close(full[:, :, start:end], part)

full = qx.view(4, 3, 38, -1)
part = qx_part.view(4, 3, 5, -1)
torch.testing.assert_close(full[:, :, start:end], part)
```

Both asserts fail.
Could you tell me what is happening, or better, document the code in `functional`? As it stands, the code just calls some `torch.ops.bitsandbytes.quantize_4bit.default`, which I don't even find in the repo, and anyway I am not CUDA knowledgable.

### Motivation

There is no 4-bit quantization natively in PyTorch, and even the 8-bit native quantization is also poorly documented.

There is value in quantization as such: I need it to compress KV caches, and also to compress activation checkpoints (for which a CPU support would be nice!).

You seem to just cater to high-level users who want to run their NN training or inference and not bother with anything. But if you do all the low-level work anyway, why not just document it, so folks like myself can use it?

### Your contribution

I'd be a happy user of BnB for 4-bit (low level) quantization if this was documented!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add docs on `QuantState` in `functional` #1652

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add docs on QuantState in functional #1652

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add docs on `QuantState` in `functional` #1652