Skip to content

Fix quant_state None on AMD GPUs by caching quant_state_dict at load time#3

Open
0xDELUXA wants to merge 1 commit intomengqin:mainfrom
0xDELUXA:fix/quant-state-none-on-amd
Open

Fix quant_state None on AMD GPUs by caching quant_state_dict at load time#3
0xDELUXA wants to merge 1 commit intomengqin:mainfrom
0xDELUXA:fix/quant-state-none-on-amd

Conversation

@0xDELUXA
Copy link
Copy Markdown

@0xDELUXA 0xDELUXA commented Mar 18, 2026

AMD support was added to bitsandbytes in: bitsandbytes-foundation/bitsandbytes#1846.

Params4bit.from_prequantized correctly loads weights on AMD but quant_state ends up None by inference time, causing:

AssertionError: assert quant_state is not None

Fix: Cache the raw quant_state_dict as self._bnb_quant_state_dict at load time, and reconstruct the QuantState object on demand in forward() if quant_state is missing.

Impact: No behavioral change on Nvidia - the fallback only triggers when quant_state is None.

Tested on AMD Radeon RX 9060 XT with Windows ROCm, bitsandbytes, and z_image_turbo_nf4_v2.safetensors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant