[ROCm] Windows workflow for creating wheels with ROCm 7.2.1 support by sstamenk · Pull Request #1915 · bitsandbytes-foundation/bitsandbytes

sstamenk · 2026-04-06T15:46:16Z

Add Windows ROCm build support and documentation

Summary

Add CI workflow support for building bitsandbytes with ROCm on Windows, using pip-installable ROCm SDK wheels
Update documentation to reflect Windows ROCm support across installation guide, README, and troubleshooting

CI Build

.github/scripts/build-rocm.sh: Add Windows branch that installs ROCm SDK wheels from repo.radeon.com, expands the devel tarball via rocm-sdk init, and builds with Ninja + Clang from the SDK. Targets RDNA 3/3.5/4 consumer GPU architectures.
.github/workflows/python-package.yml: Add windows-2025 + ROCm 7.2.1 to the build-rocm matrix via include (gated to 7.2.1 only). Add conditional MSVC setup step and Linux-only disk cleanup guard.

CMakeLists.txt fixes

Path normalization: Use file(TO_CMAKE_PATH ...) when reading $ENV{ROCM_PATH} to convert Windows backslashes to forward slashes, preventing CMake escape errors in generated HIP compiler files.
HIP_PLATFORM detection: Set HIP_PLATFORM=amd explicitly because the pip SDK's hip-config.cmake cannot auto-detect it.
DLL output directory: Extend RUNTIME_OUTPUT_DIRECTORY / LIBRARY_OUTPUT_DIRECTORY to all WIN32 builds (not just MSVC), so Clang/Ninja HIP builds place the DLL in bitsandbytes/.

Documentation

docs/source/installation.mdx: Add Windows row to ROCm wheel table, add Windows compile-from-source tab with pip SDK workflow, update PyTorch ROCm links to cover Windows, add warning about GPU arch coverage in pre-built wheels.
README.md: Add AMD GPU row to Windows platform support table.
docs/source/errors.mdx: Add generic CUDA/ROCm version mismatch troubleshooting section, ROCm GPU arch mismatch section, and general diagnostics guidance.

Minor fixes

bitsandbytes/diagnostics/cuda.py: Add amdhip64*.dll Windows pattern to HIP runtime library search.
bitsandbytes/cuda_specs.py: Mark get_rocm_warpsize() as dead code (no callers in the codebase).
tests/test_linear4bit.py: Guard torch.distributed.is_nccl_available() with getattr fallback so the FSDP test is cleanly skipped instead of crashing during collection on torch builds without distributed backends (e.g. Windows ROCm).

Testing

Verified locally on Windows 11 with AMD Radeon 780M (gfx1103):

CMake configures successfully with pip-installed ROCm SDK 7.2.1
HIP compilation produces libbitsandbytes_rocm72.dll
GPU quantization and Linear8bitLt pass end-to-end
python -m bitsandbytes sanity check fails with a segfault when using PyTorch wheels from repo.radeon.com/rocm/windows/rocm-rel-7.2.1/ due to missing gfx1103 support in that build, but works on other architectures
Full test suite run with TheRock 7.12 release from repo.amd.com/rocm/whl/gfx110X-all/ (which includes gfx1103 PyTorch support):

28 failed, 2628 passed, 160 skipped, 29 deselected, 30 xfailed, 110 warnings in 2100.71s (0:35:00)

The 28 failures are pre-existing fp32 numerical precision threshold tests -- not regressions from this PR.

sstamenk · 2026-04-06T15:53:19Z

Note on GPU architecture list: Limited the Windows build targets to gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201 because those are the only architectures supported by the PyTorch wheels at repo.radeon.com/rocm/windows/rocm-rel-7.2.1/. Other architectures should work in theory -- I've confirmed this on gfx1103 (Radeon 780M) where bitsandbytes successfully compiles, loads, and runs quantization functions end-to-end. I can add other architectures to the list if we want to go that way.

Why not TheRock preview releases: I decided not to include TheRock 7.11/7.12 preview releases from repo.amd.com/rocm/whl/ because the artifacts are split by architecture family (e.g. gfx110X-all/, gfx120X-all/), which doesn't map cleanly to a single install URL in the build script. There are plans upstream to move to unified builds, so we can update this when that lands.

@matthewdouglas Are there any potential CI issues in adding this Windows support for ROCm that I'm not aware of?

github-actions · 2026-04-06T22:02:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2026-04-07T19:26:56Z

Looks good, thanks! A couple quick questions:

Is ROCm 7.2 the only version we want to build/support for Windows so far, or should there be any earlier ones?
On the Linux builds we set it to MinSizeRel and used --offload-compress to minimize size. Is this not supported for Windows?

sstamenk · 2026-04-07T20:18:15Z

Is ROCm 7.2 the only version we want to build/support for Windows so far, or should there be any earlier ones?

I opted only for 7.2.1 because of the way ROCm wheels are tagged at https://repo.radeon.com/rocm/windows/ (this version names them with rocm-<rocm_version>.tar.gz while 7.1.1 uses rocm-0.1-dev0.tar.gz). In theory we can also do 7.1.1 I would just have to update the logic to parse this correctly.

For other versions I'm not sure, we can probably build using HIP SDK but that would be a different approach completely. This seemed easier and I think the wheels are more recent (the latest version of HIP SDK is 7.1 afaik). I can look into this and address it in a follow up if needed.

On the Linux builds we set it to MinSizeRel and used --offload-compress to minimize size. Is this not supported for Windows?

Added

matthewdouglas · 2026-04-08T21:00:56Z

Sticking to 7.2.1+ works for me, thanks!

0xDELUXA · 2026-04-14T11:48:12Z

This is great!

One thing I want to note is that bitsandbytes also works on GPUs other than those listed in:

bnb_rocm_arch="gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201"

It has been validated via my fork that it works on all ROCm (TheRock)–supported RDNA2, RDNA3, RDNA3.5, and RDNA4 GPUs on Windows.

sstamenk · 2026-04-14T12:39:32Z

I have no doubt that all of TheRock supported architectures work. The main issue is that TheRock artifacts are currently split by architecture family which complicates things for generating a single wheel that supports all of the architectures. Once TheRock unifies them we can address this in a follow up. That's the main reason why I decided to stick with 7.2.1 and the architectures that are built by its PyTorch wheels.

Are you saying that building all of the architectures works when only having for example gfx1151 specific wheels installed? I haven't explicitly tested this.

0xDELUXA · 2026-04-14T12:51:26Z

Are you saying that building all of the architectures works when only having for example gfx1151 specific wheels installed? I haven't explicitly tested this.

I’ve built a single upstream wheel with TheRock, covering all RDNA architectures, and we’ve tested it on certain GPUs across RDNA2 through RDNA4; it works as expected.

sstamenk · 2026-04-15T09:56:33Z

Let me do some testing and if I don't run into any issues, I'm open to expanding the Windows support matrix to cover the TheRock 7.11 and 7.12 releases from https://repo.amd.com/rocm/whl/. It would be up to @matthewdouglas if he agrees with adding this coverage.

There are 2 considerations to have in mind:

This would cover only Windows since the Linux workflow uses the officially released Ubuntu 22.04 docker images from https://hub.docker.com/r/rocm/dev-ubuntu-22.04 to build and those aren't available for TheRock releases AFAIK
The wheel URL would hardcode one gfx family, for example gfx110X-all (assuming the build works across other families as well)

0xDELUXA · 2026-04-15T10:31:59Z

Yeah, based on my experience, installing TheRock ROCm (Windows) for one architecture allows bitsandbytes to be built for all others.

Add Windows ROCm workflow for building wheels with ROCm 7.2.1 support

9c1dff7

sstamenk force-pushed the rocm_windows_workflow branch from dcf3e92 to 9c1dff7 Compare April 6, 2026 16:12

matthewdouglas added this to the v0.50.0 milestone Apr 6, 2026

matthewdouglas added Documentation Improvements or additions to documentation Windows ROCm Build CI/CD labels Apr 6, 2026

Compress binary size with build flags

8b26b81

matthewdouglas merged commit 250cdb3 into bitsandbytes-foundation:main Apr 8, 2026
92 checks passed

Apophis3158 mentioned this pull request Apr 14, 2026

better install: streamline GPU detection and pip configuration patientx-cfz/comfyui-rocm#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Windows workflow for creating wheels with ROCm 7.2.1 support#1915

[ROCm] Windows workflow for creating wheels with ROCm 7.2.1 support#1915
matthewdouglas merged 2 commits intobitsandbytes-foundation:mainfrom
sstamenk:rocm_windows_workflow

sstamenk commented Apr 6, 2026 •

edited

Loading

Uh oh!

sstamenk commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

matthewdouglas commented Apr 7, 2026

Uh oh!

sstamenk commented Apr 7, 2026 •

edited

Loading

Uh oh!

matthewdouglas commented Apr 8, 2026

Uh oh!

Uh oh!

0xDELUXA commented Apr 14, 2026

Uh oh!

sstamenk commented Apr 14, 2026 •

edited

Loading

Uh oh!

0xDELUXA commented Apr 14, 2026 •

edited

Loading

Uh oh!

sstamenk commented Apr 15, 2026

Uh oh!

0xDELUXA commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sstamenk commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Windows ROCm build support and documentation

Summary

CI Build

CMakeLists.txt fixes

Documentation

Minor fixes

Testing

Uh oh!

sstamenk commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

matthewdouglas commented Apr 7, 2026

Uh oh!

sstamenk commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewdouglas commented Apr 8, 2026

Uh oh!

Uh oh!

0xDELUXA commented Apr 14, 2026

Uh oh!

sstamenk commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xDELUXA commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sstamenk commented Apr 15, 2026

Uh oh!

0xDELUXA commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sstamenk commented Apr 6, 2026 •

edited

Loading

sstamenk commented Apr 6, 2026 •

edited

Loading

sstamenk commented Apr 7, 2026 •

edited

Loading

sstamenk commented Apr 14, 2026 •

edited

Loading

0xDELUXA commented Apr 14, 2026 •

edited

Loading

0xDELUXA commented Apr 15, 2026 •

edited

Loading