Skip to content

add support for ROCm-based toolchains (rocm-compilers, rompi, rfbf, rfoss)#5099

Merged
casparvl merged 15 commits intoeasybuilders:developfrom
Thyre:rocm-toolchain
Mar 17, 2026
Merged

add support for ROCm-based toolchains (rocm-compilers, rompi, rfbf, rfoss)#5099
casparvl merged 15 commits intoeasybuilders:developfrom
Thyre:rocm-toolchain

Conversation

@Thyre
Copy link
Copy Markdown
Collaborator

@Thyre Thyre commented Jan 23, 2026

This PR adds a rocm-compilers toolchain based on the extremely heavy lifting done for the LLVM toolchain.

We're reusing most of the efforts done there to get this toolchain running.
Marked as draft for now, since we're missing crucial parts such as math and MPI toolchains.

Since we're exploring as we go, we first need to get there with ECs...

Thyre added 6 commits January 23, 2026 15:44
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
@Thyre
Copy link
Copy Markdown
Collaborator Author

Thyre commented Mar 4, 2026

Math libraries are currently blocked by OpenMathLib/OpenBLAS#5664.

Comment thread easybuild/toolchains/rfoss.py Outdated
Comment thread easybuild/toolchains/compiler/rocm_compilers.py Outdated
Comment thread easybuild/toolchains/rfbf.py Outdated
Comment thread easybuild/toolchains/rocm_compilers.py Outdated
Comment thread easybuild/toolchains/rompi.py Outdated
@casparvl
Copy link
Copy Markdown
Contributor

Trying to build rocTracer using https://github.com/zerefwayne/easybuild-easyconfigs/blob/a65a69e2b9e1c6cba052045966e116ad69120018/easybuild/easyconfigs/r/rocTracer/rocTracer-4.1.0-rocm-compilers-ROCm-6.4.1.eb and with the toolchain as defined here, we ran into an issue that as soon as actual HIP code was compiled, we got:

[ 35%] Building HIPCC object test/CMakeFiles/MatrixTranspose.dir/hip/MatrixTranspose_generated_MatrixTranspose.cpp.o
cd /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip && /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen2/software/CMake/3.31.3-GCCcore-14.2.0/bin/cmake -E make_directory /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/.
cd /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip && /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen2/software/CMake/3.31.3-GCCcore-14.2.0/bin/cmake -D verbose:BOOL=1 -D build_configuration:STRING=RELEASE -D generated_file:STRING=/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/./MatrixTranspose_generated_MatrixTranspose.cpp.o -P /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/MatrixTranspose_generated_MatrixTranspose.cpp.o.cmake
-- Removing /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/./MatrixTranspose_generated_MatrixTranspose.cpp.o
/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen2/software/CMake/3.31.3-GCCcore-14.2.0/bin/cmake -E remove /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/./MatrixTranspose_generated_MatrixTranspose.cpp.o
-- Generating dependency file: /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/MatrixTranspose_generated_MatrixTranspose.cpp.o.depend.pre
/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/HIP/6.4.1-rocm-compilers-19.0.0-ROCm-6.4.1/bin/hipcc -M /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/test/hip/MatrixTranspose.cpp -o /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/MatrixTranspose_generated_MatrixTranspose.cpp.o.depend.pre -DAMD_INTERNAL_BUILD -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/inc -I/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/HIP/6.4.1-rocm-compilers-19.0.0-ROCm-6.4.1/include -I/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/ROCm-LLVM/19.0.0-GCCcore-14.2.0-ROCm-6.4.1/include -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/src -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/inc -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/inc
Device not supported - Defaulting to AMD
sh: line 1: /gpfs/scratch1/nodespecific/tcn194/casparl.20649084/eb-zereuwhd/tmpmsxx08o4/rpath_wrappers/amdclangxx_wrapper/clang++: No such file or directory
failed to execute:/gpfs/scratch1/nodespecific/tcn194/casparl.20649084/eb-zereuwhd/tmpmsxx08o4/rpath_wrappers/amdclangxx_wrapper/clang++  --cuda-host-only -O3 --hip-device-lib-path="/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/ROCm-LLVM/19.0.0-GCCcore-14.2.0-ROCm-6.4.1/amdgcn/bitcode"  -M -x hip /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/test/hip/MatrixTranspose.cpp -o "/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/MatrixTranspose_generated_MatrixTranspose.cpp.o.depend.pre" -DAMD_INTERNAL_BUILD -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/inc -I/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/HIP/6.4.1-rocm-compilers-19.0.0-ROCm-6.4.1/include -I/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/ROCm-LLVM/19.0.0-GCCcore-14.2.0-ROCm-6.4.1/include -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/src -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/inc -I/tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/roctracer-rocm-6.4.1/inc
CMake Error at MatrixTranspose_generated_MatrixTranspose.cpp.o.cmake:146 (message):
  Error generating
  /tmp/casparl/easybuild/build/rocTracer/4.1.0/rocm-compilers-19.0.0-ROCm-6.4.1/easybuild_obj/test/CMakeFiles/MatrixTranspose.dir/hip/./MatrixTranspose_generated_MatrixTranspose.cpp.o

I.e. it tries to use a compiler /gpfs/scratch1/nodespecific/tcn194/casparl.20649084/eb-zereuwhd/tmpmsxx08o4/rpath_wrappers/amdclangxx_wrapper/clang++ which clearly doesn't exist. @Thyre traced it down to https://github.com/ROCm/llvm-project/blob/c87081df219c42dc27c5b6d86c0525bc7d01f727/amd/hipcc/src/hipBin_amd.h#L383 which essentially just hard-codes clang++ as the compiler name, and searches in the path returned by getCompilerPath() - which I guess in our case would be the .../rpath_wrappers/amdclangxx_wrapper path.

Two possible solutions are:

  • Patch HIP so that their getHipCC function respects the configured compiler (e.g. the one set as CXX)
  • Simply use the clang/clang++ compilers in our rocm-compilers toolchain as well.

Discussed in Slack with @Thyre , and we'll go for the 2nd option. The amdclang compiler just symlinks amdllvm, which itself is a tiny (binary) wrapper that calls the equivalent regular clang compiler. I.e. amdclang calls amdllvm calls clang. Since this only requires a change on the EasyBuild side, it's easier than patching HIP indefinitely. And, with TheRock, things may change anyway, and we might need to reconsider things anyway - whichever of the two solutions we choose.

Caspar van Leeuwen and others added 2 commits March 13, 2026 14:15
Comment thread easybuild/toolchains/compiler/rocm_compilers.py Outdated
@casparvl
Copy link
Copy Markdown
Contributor

As discussed in the meeting, we want to put clang, clang++, flang in the same wrapper dirs as amdclang, amdclang++, amdflang.

To do this, we should (probably) change prepare_rpath_wrappers. But that's a rather huge function, and I guess we don't want to fully duplicate that for the rocm-compilers class.

For the first point: we would like clang++, to end up in the amdclang++ subdir. How do we do this, add an extra argument that specifies an optional prefix (amd) for this subdir? It is pretty specific, and probably will only ever be used by rocm-compilers, but we definely don't want to copy-paste the full prepare_rpath_wrappers code to make a rocm-compilers specific implementation - that'd be maintanence hell.

For the second point: we should probably add an option to prepare_rpath_wrappers to pass the compiler and linker list to be wrapped as arguments (default: None, with the old behavior as default).

What do you think @boegel ?

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 16, 2026

@casparvl I think it should suffice to just extend what prepare_rpath_wrappers does in easybuild/toolchains/compiler/rocm_compilers.py, I'm looking into that right now...

@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 16, 2026

boegel and others added 3 commits March 16, 2026 14:59
…+ create additional RPATH wrapper alongside them for clang/clang++/flang
use `amdclang`/`amdclang++`/`amdflang` compiler commands in ROCm compilers + create additional RPATH wrapper alongside them for `clang`/`clang++`/`flang`
@Thyre Thyre marked this pull request as ready for review March 17, 2026 09:48
@boegel boegel modified the milestones: 5.x, next release (5.2.2?) Mar 17, 2026
Copy link
Copy Markdown
Contributor

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked flawlessly in this build of 19 rocm-compilers based easyconfigs.

I'm a little surprised that the is_deprecated function seems to be defined somewhat inconsistently - e.g. I would have expected one for rfbf as well, and I would have expected the sanitization of the a/b suffix from rompi to also be needed in rfoss - but I see the same is true for gompi, gfbf and foss. Not going to make a big deal out of it here: if it wasn't an issue for the GCC-based toolchain, I'm not considering it an issue here.

One thing I'm wondering about is the higher toolchains. I don't currently have to means to test them, as we have no easyconfigs for those (yet). Should we keep them in this PR, and merge them so that we at least have a starting point in framework when we start with these easyconfigs? Or should we only add rocm-compilers in this PR, and move the rest to a separate PR (that we'll merge & test when we have easyconfigs for those)?

I personally have a small preference for mergin this 'as is', since it's a bit easier to start experimenting with new easyconfigs (no need for running development builds and such) - and we can always fix issues if they surface then.

@boegel what do you think?

Comment thread easybuild/toolchains/rfoss.py Outdated
Comment thread easybuild/toolchains/rfoss.py Outdated
Copy link
Copy Markdown
Contributor

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @boegel on chat. Since this is all new, it's fine that rfbf rfoss and rompi haven't been tested yet - we'll cross that bridge when we get there. At least this way, we have the basis in framework.

Thanks a lot for all the effort @Thyre !

@casparvl casparvl merged commit 2b8e373 into easybuilders:develop Mar 17, 2026
40 checks passed
@boegel
Copy link
Copy Markdown
Member

boegel commented Mar 17, 2026

Discussed with @boegel on chat. Since this is all new, it's fine that rfbf rfoss and rompi haven't been tested yet - we'll cross that bridge when we get there. At least this way, we have the basis in framework.

Thanks a lot for all the effort @Thyre !

Confirmed, this is all new, so happy to see this merged and follow-up with specific PRs should any surprises arise while testing this further...

@boegel boegel changed the title Add ROCm toolchain based on LLVM toolchain efforts add support for ROCm-based toolchains (rocm-compilers, rompi, rfbf, rfoss) Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants