Skip to content

enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock#3531

Open
bedroge wants to merge 10 commits intoeasybuilders:developfrom
bedroge:gromacs_heffte
Open

enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock#3531
bedroge wants to merge 10 commits intoeasybuilders:developfrom
bedroge:gromacs_heffte

Conversation

@bedroge
Copy link
Copy Markdown
Contributor

@bedroge bedroge commented Dec 13, 2024

In EESSI we noticed that GROMACS builds currently show the following with gmx -version:

Multi-GPU FFT:       none
Hwloc support:       disabled

Hwloc is part of the foss toolchain and can be easily enabled.

For Multi-GPU FFT support either cuFFTMp (https://manual.gromacs.org/documentation/current/install-guide/index.html#using-cufftmp) or HeFFTe (https://manual.gromacs.org/documentation/current/install-guide/index.html#using-heffte) is required. I was trying to add support for both, but cuFFTMp is part of NVHPC, and simply adding that as dependency will make GROMACS pick up other stuff from that installation (e.g. OpenMP libraries). Since cuFFTMp also imposes some additional requirements (see https://docs.nvidia.com/hpc-sdk/cufftmp/usage/requirements.html), I've only added HeFFTe support for now. I've also just opened an easyconfigs PR for HeFFTe with CUDA support: easybuilders/easybuild-easyconfigs#22024. Once that's merged, I'll make another to add this as a dependency to CUDA versions of GROMACS.

Copy link
Copy Markdown
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, matches what I can find in the docs. My only concern is that there is no version-checking with the new options. I think the hwloc one is there since 2016, but the HEFFTE one is more recent (I can't quite figure it out but I think it is 2023, see https://gitlab.com/gromacs/gromacs/-/issues/4090). For our own use case we could make the check be more recent than when it was first supported.

EDIT: Indeed, heffte seems to first appear in 2023.1: https://manual.gromacs.org/2023.1/install-guide/index.html

EDIT: The option for hwloc is first documented in 2016.4: https://manual.gromacs.org/2016.4/install-guide/index.html

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Dec 16, 2024

This looks good, matches what I can find in the docs. My only concern is that there is no version-checking with the new options. I think the hwloc one is there since 2016, but the HEFFTE one is more recent (I can't quite figure it out but I think it is 2023, see https://gitlab.com/gromacs/gromacs/-/issues/4090). For our own use case we could make the check be more recent than when it was first supported.

EDIT: Indeed, heffte seems to first appear in 2023.1: https://manual.gromacs.org/2023.1/install-guide/index.html

EDIT: The option for hwloc is first documented in 2016.4: https://manual.gromacs.org/2016.4/install-guide/index.html

Thanks! I added the version checks, hadn't seen your edits. But I also see hwloc being mentioned in the 2016.1 docs, and it's also in the code:
https://gitlab.com/gromacs/gromacs/-/blob/v2016.1/CMakeLists.txt?ref_type=tags#L506

HeFFTe is being mentioned in the 2023 docs (https://manual.gromacs.org/current/release-notes/2023/major/performance.html#pme-decomposition-support-with-cuda-and-sycl-backends), and also in the CMake file for 2023 (https://gitlab.com/gromacs/gromacs/-/blob/v2023/CMakeLists.txt?ref_type=tags#L741) and 2023.1 (https://gitlab.com/gromacs/gromacs/-/blob/v2023.1/CMakeLists.txt?ref_type=tags#L749).

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Dec 16, 2024

One thing I was a little bit worried about is that the HeFFTe installation requires a GPU (for the tests), hence simply installing GROMACS and its dependencies will also require a GPU if we enable this by default. Should we make it optional in some way (commenting out the HeFFTe depdendency or disabling the tests)? For EESSI it would currently already cause an issue, as we build on nodes without GPUs.

@al42and
Copy link
Copy Markdown

al42and commented Jan 3, 2025

Hi!

Don't want to derail the discussion here, but, while I don't have any recent numbers, the situation has not changed much from what NVIDIA reports in their blog:

We find cuFFTMp to be up to 2x faster [than HeFFTe]

HeFFTe has the benefit of supporting AMD and Intel GPUs, but it's not the best choice for CUDA installation. cuFFTMp has its own share of issues, as @bedroge outlined in the PR description, but I think a performance difference is relevant for evaluating which effort is more worthwhile.

Regarding versioning, can confirm that heffte (and cufftmp) were added in 2023, and hwloc was added in 2016.

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Jan 3, 2025

Hi!

Don't want to derail the discussion here, but, while I don't have any recent numbers, the situation has not changed much from what NVIDIA reports in their blog:

We find cuFFTMp to be up to 2x faster [than HeFFTe]

HeFFTe has the benefit of supporting AMD and Intel GPUs, but it's not the best choice for CUDA installation. cuFFTMp has its own share of issues, as @bedroge outlined in the PR description, but I think a performance difference is relevant for evaluating which effort is more worthwhile.

Regarding versioning, can confirm that heffte (and cufftmp) were added in 2023, and hwloc was added in 2016.

Thanks for your input, it's definitely a fair point. I initially added only HeFFTe support in this PR, as it seemed like a more logical default option (e.g. no additional requirements on the hardware like with cuFFTMp), and a first attempt at adding cuFFTMp support failed miserably 😅 But I can have another look at it, ultimately it would be nice if the easyblock supports both, and people can choose between the two of them.

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Apr 28, 2026

Now that we have an easyconfig for cuFFTMp (https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/c/cuFFTMp/cuFFTMp-11.4.0-gompi-2025b-CUDA-12.9.1.eb), it's trivial to add support for it. I've done that in 23eda47.

Initially it didn't compile because it was picking up the cufft.h from CUDA, while it should pick up the one from cuFFTMp, causing errors like:

/home/bob/eessi/versions/2025.06/software/linux/x86_64/intel/cascadelake/software/cuFFTMp/11.4.0-gompi-2025b-CUDA-12.9.1/include/cufftMp.h:74:4: error: #error cuFFT and cuFFTMp version mismatch. .../math_libs/X.Y/include/cufftmp/ should be included before .../math_libs/X.Y/include/
   74 |   #error cuFFT and cuFFTMp version mismatch. .../math_libs/X.Y/include/cufftmp/ should be included before .../math_libs/X.Y/include/
      |    ^~~~~

I couldn't really force it to use the header provided by cuFFTMp first (moving it up or down in the deps list didn't work), but adding -Xcompiler -v to the nvcc command showed that it first searches in the source dir. So, I solved things by adding a symlink to the correct cufft.h in the source dir, and that allowed me to complete the build. The tests are still failing for me, as I don't have a system that meets the requirements, see https://docs.nvidia.com/cuda/cufftmp/usage/requirements.html and https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#nvshmem-installation-guide.

@bedroge bedroge requested a review from ocaisa April 28, 2026 14:50
@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Apr 28, 2026

Assuming you cannot select the multi-GPU FFT library at runtime, we would need to select them at build time. We could do that by having an easyconfig parameter that controls this, or by having separate easyconfigs with a corresponding version suffix?

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Apr 28, 2026

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb"

@boegelbot
Copy link
Copy Markdown

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3531 EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3531 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10308

Test results coming soon (I hope)...

Details

- notification for comment with ID 4336494167 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Apr 28, 2026

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2023.1-foss-2022a.eb GROMACS-2025.4-foss-2025b.eb"

@boegelbot
Copy link
Copy Markdown

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3531 EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2023.1-foss-2022a.eb GROMACS-2025.4-foss-2025b.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3531 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10310

Test results coming soon (I hope)...

Details

- notification for comment with ID 4336609134 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb

Build succeeded for 1 out of 1 (total: 28 mins 39 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.48.01, Python 3.9.25
See https://gist.github.com/boegelbot/ab97d5bb6bfe0c6b8b22e861b87b1b52 for a full test report.

@boegelbot
Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2023.1-foss-2022a.eb

  • SUCCESS GROMACS-2025.4-foss-2025b.eb

Build succeeded for 2 out of 2 (total: 1 hour 11 mins 15 secs) (2 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.25
See https://gist.github.com/boegelbot/225f832fb2fec3d1ee57ebe047f9a51c for a full test report.

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Apr 28, 2026

So, I solved things by adding a symlink to the correct cufft.h in the source dir, and that allowed me to complete the build.

Hmm, forgot to actually include that fix, but I just pushed it (cfb6f93).

@bedroge
Copy link
Copy Markdown
Contributor Author

bedroge commented Apr 28, 2026

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb"

Comment thread easybuild/easyblocks/g/gromacs.py Outdated
Copy link
Copy Markdown
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@ocaisa
Copy link
Copy Markdown
Member

ocaisa commented Apr 28, 2026

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb"

@bedroge bedroge changed the title enable hwloc and HeFFTe support in GROMACS easyblock enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock Apr 28, 2026
@boegelbot
Copy link
Copy Markdown

@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3531 EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3531 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10314

Test results coming soon (I hope)...

Details

- notification for comment with ID 4339145797 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb

Build succeeded for 1 out of 1 (total: 28 mins 4 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.48.01, Python 3.9.25
See https://gist.github.com/boegelbot/9f6d82d29030b0597d7d12240a590bca for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants