Skip to content

Add GPU-aware MPI easyconfigs for ROCm#25902

Draft
zerefwayne wants to merge 17 commits intoeasybuilders:developfrom
zerefwayne:rocm-mpi
Draft

Add GPU-aware MPI easyconfigs for ROCm#25902
zerefwayne wants to merge 17 commits intoeasybuilders:developfrom
zerefwayne:rocm-mpi

Conversation

@zerefwayne
Copy link
Copy Markdown
Contributor

@zerefwayne zerefwayne commented May 3, 2026

The OpenMPI and rompi built in #25847 was not fully configured for ROCm. The dependencies for OpenMPI also need to be adapted for rocm.

@github-actions github-actions Bot added the update label May 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

Updated software PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb

Diff against PMIx-6.1.0-GCCcore-15.2.0.eb

easybuild/easyconfigs/p/PMIx/PMIx-6.1.0-GCCcore-15.2.0.eb

diff --git a/easybuild/easyconfigs/p/PMIx/PMIx-6.1.0-GCCcore-15.2.0.eb b/easybuild/easyconfigs/p/PMIx/PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb
index 6c5c09a11b..9b85fe95d1 100644
--- a/easybuild/easyconfigs/p/PMIx/PMIx-6.1.0-GCCcore-15.2.0.eb
+++ b/easybuild/easyconfigs/p/PMIx/PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb
@@ -1,7 +1,9 @@
+# Author:   Aayush Joglekar <aayush.joglekar@surf.nl>
+
 easyblock = 'ConfigureMake'
 
 name = 'PMIx'
-version = '6.1.0'
+version = '5.0.6'
 
 homepage = 'https://pmix.org/'
 description = """Process Management for Exascale Environments
@@ -16,23 +18,22 @@ provide a reference implementation of the PMI-server that demonstrates
 the desired level of scalability.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '15.2.0'}
+toolchain = {'name': 'rocm-compilers', 'version': '19.0.0-ROCm-6.4.1'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openpmix/openpmix/releases/download/v%(version)s']
 sources = ['%(namelower)s-%(version)s.tar.bz2']
-checksums = ['bb9021c8e100a376f5070ecca727f83a29b5f652dfe381793b88daa79a3b98a2']
+checksums = ['ea51baa0fdee688d54bc9f2c11937671381f00de966233eec6fd88807fb46f83']
 
 builddependencies = [
-    ('binutils', '2.45'),
-    ('Perl', '5.42.0'),
-    ('pkgconf', '2.5.1'),
+    ('Perl', '5.40.0'),
+    ('pkgconf', '2.3.0'),
 ]
 
 dependencies = [
     ('libevent', '2.1.12'),
-    ('zlib', '2.3.2'),
-    ('hwloc', '2.13.0'),
+    ('zlib', '1.3.1'),
+    ('hwloc-ROCm', '2.11.2'),
 ]
 
 configopts = ' --with-libevent=$EBROOTLIBEVENT --with-zlib=$EBROOTZLIB'
Diff against PMIx-5.0.8-GCCcore-14.3.0.eb

easybuild/easyconfigs/p/PMIx/PMIx-5.0.8-GCCcore-14.3.0.eb

diff --git a/easybuild/easyconfigs/p/PMIx/PMIx-5.0.8-GCCcore-14.3.0.eb b/easybuild/easyconfigs/p/PMIx/PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb
index 51dc7b14d7..9b85fe95d1 100644
--- a/easybuild/easyconfigs/p/PMIx/PMIx-5.0.8-GCCcore-14.3.0.eb
+++ b/easybuild/easyconfigs/p/PMIx/PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb
@@ -1,7 +1,9 @@
+# Author:   Aayush Joglekar <aayush.joglekar@surf.nl>
+
 easyblock = 'ConfigureMake'
 
 name = 'PMIx'
-version = '5.0.8'
+version = '5.0.6'
 
 homepage = 'https://pmix.org/'
 description = """Process Management for Exascale Environments
@@ -16,23 +18,22 @@ provide a reference implementation of the PMI-server that demonstrates
 the desired level of scalability.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '14.3.0'}
+toolchain = {'name': 'rocm-compilers', 'version': '19.0.0-ROCm-6.4.1'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openpmix/openpmix/releases/download/v%(version)s']
 sources = ['%(namelower)s-%(version)s.tar.bz2']
-checksums = ['bf5f0a341d0ec7f465627a7570f4dcda3b931bc859256428a35f6c72f13462d0']
+checksums = ['ea51baa0fdee688d54bc9f2c11937671381f00de966233eec6fd88807fb46f83']
 
 builddependencies = [
-    ('binutils', '2.44'),
-    ('Perl', '5.40.2'),
-    ('pkgconf', '2.4.3'),
+    ('Perl', '5.40.0'),
+    ('pkgconf', '2.3.0'),
 ]
 
 dependencies = [
     ('libevent', '2.1.12'),
     ('zlib', '1.3.1'),
-    ('hwloc', '2.12.1'),
+    ('hwloc-ROCm', '2.11.2'),
 ]
 
 configopts = ' --with-libevent=$EBROOTLIBEVENT --with-zlib=$EBROOTZLIB'
Diff against PMIx-6.0.0-GCCcore-14.3.0.eb

easybuild/easyconfigs/p/PMIx/PMIx-6.0.0-GCCcore-14.3.0.eb

diff --git a/easybuild/easyconfigs/p/PMIx/PMIx-6.0.0-GCCcore-14.3.0.eb b/easybuild/easyconfigs/p/PMIx/PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb
index 15e3705e89..9b85fe95d1 100644
--- a/easybuild/easyconfigs/p/PMIx/PMIx-6.0.0-GCCcore-14.3.0.eb
+++ b/easybuild/easyconfigs/p/PMIx/PMIx-5.0.6-rocm-compilers-19.0.0-ROCm-6.4.1.eb
@@ -1,7 +1,9 @@
+# Author:   Aayush Joglekar <aayush.joglekar@surf.nl>
+
 easyblock = 'ConfigureMake'
 
 name = 'PMIx'
-version = '6.0.0'
+version = '5.0.6'
 
 homepage = 'https://pmix.org/'
 description = """Process Management for Exascale Environments
@@ -16,23 +18,22 @@ provide a reference implementation of the PMI-server that demonstrates
 the desired level of scalability.
 """
 
-toolchain = {'name': 'GCCcore', 'version': '14.3.0'}
+toolchain = {'name': 'rocm-compilers', 'version': '19.0.0-ROCm-6.4.1'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openpmix/openpmix/releases/download/v%(version)s']
 sources = ['%(namelower)s-%(version)s.tar.bz2']
-checksums = ['bfe969966d0ce82e032739cac286239bd5ad74a831d7adae013284919f125318']
+checksums = ['ea51baa0fdee688d54bc9f2c11937671381f00de966233eec6fd88807fb46f83']
 
 builddependencies = [
-    ('binutils', '2.44'),
-    ('Perl', '5.40.2'),
-    ('pkgconf', '2.4.3'),
+    ('Perl', '5.40.0'),
+    ('pkgconf', '2.3.0'),
 ]
 
 dependencies = [
     ('libevent', '2.1.12'),
     ('zlib', '1.3.1'),
-    ('hwloc', '2.12.1'),
+    ('hwloc-ROCm', '2.11.2'),
 ]
 
 configopts = ' --with-libevent=$EBROOTLIBEVENT --with-zlib=$EBROOTZLIB'

Updated software PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb

Diff against PRRTE-4.1.0-GCCcore-15.2.0.eb

easybuild/easyconfigs/p/PRRTE/PRRTE-4.1.0-GCCcore-15.2.0.eb

diff --git a/easybuild/easyconfigs/p/PRRTE/PRRTE-4.1.0-GCCcore-15.2.0.eb b/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb
index d35338ca0d..7ebfa03fd0 100644
--- a/easybuild/easyconfigs/p/PRRTE/PRRTE-4.1.0-GCCcore-15.2.0.eb
+++ b/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb
@@ -1,27 +1,27 @@
+# Author:   Aayush Joglekar <aayush.joglekar@surf.nl>
+
 easyblock = 'ConfigureMake'
 
 name = 'PRRTE'
-version = '4.1.0'
+version = '3.0.8'
 
 homepage = 'https://docs.prrte.org/'
 description = """PRRTE is the PMIx Reference RunTime Environment"""
 
-toolchain = {'name': 'GCCcore', 'version': '15.2.0'}
+toolchain = {'name': 'rocm-compilers', 'version': '19.0.0-ROCm-6.4.1'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openpmix/prrte/releases/download/v%(version)s']
 sources = ['%(namelower)s-%(version)s.tar.bz2']
-checksums = ['285ad62b670075708b9fcfe14c54baa599733bc274d10502a82e8eebba0b7c70']
+checksums = ['e798192fa0ab38172818a109a6c89bcc37e4b1123ca150d8c115dee5231750de']
 
-builddependencies = [
-    ('binutils', '2.45'),
-    ('pkgconf', '2.5.1'),
-]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
     ('libevent', '2.1.12'),
-    ('hwloc', '2.13.0'),
-    ('PMIx', '6.1.0'),
+    ('hwloc-ROCm', '2.11.2'),
+    # also picks up rocm-compilers version
+    ('PMIx', '5.0.6'),
 ]
 
 configopts = ' --with-libevent=$EBROOTLIBEVENT'
Diff against PRRTE-3.0.11-GCCcore-14.3.0.eb

easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.11-GCCcore-14.3.0.eb

diff --git a/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.11-GCCcore-14.3.0.eb b/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb
index 0eb711cd4e..7ebfa03fd0 100644
--- a/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.11-GCCcore-14.3.0.eb
+++ b/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb
@@ -1,27 +1,27 @@
+# Author:   Aayush Joglekar <aayush.joglekar@surf.nl>
+
 easyblock = 'ConfigureMake'
 
 name = 'PRRTE'
-version = '3.0.11'
+version = '3.0.8'
 
 homepage = 'https://docs.prrte.org/'
 description = """PRRTE is the PMIx Reference RunTime Environment"""
 
-toolchain = {'name': 'GCCcore', 'version': '14.3.0'}
+toolchain = {'name': 'rocm-compilers', 'version': '19.0.0-ROCm-6.4.1'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openpmix/prrte/releases/download/v%(version)s']
 sources = ['%(namelower)s-%(version)s.tar.bz2']
-checksums = ['37af5a82d333a54c0bac358f06c194427b7dbfa7b8b85f2ddd1145acf71cfdd4']
+checksums = ['e798192fa0ab38172818a109a6c89bcc37e4b1123ca150d8c115dee5231750de']
 
-builddependencies = [
-    ('binutils', '2.44'),
-    ('pkgconf', '2.4.3'),
-]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
     ('libevent', '2.1.12'),
-    ('hwloc', '2.12.1'),
-    ('PMIx', '5.0.8'),
+    ('hwloc-ROCm', '2.11.2'),
+    # also picks up rocm-compilers version
+    ('PMIx', '5.0.6'),
 ]
 
 configopts = ' --with-libevent=$EBROOTLIBEVENT'
Diff against PRRTE-4.0.0-GCCcore-14.3.0.eb

easybuild/easyconfigs/p/PRRTE/PRRTE-4.0.0-GCCcore-14.3.0.eb

diff --git a/easybuild/easyconfigs/p/PRRTE/PRRTE-4.0.0-GCCcore-14.3.0.eb b/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb
index 4206a4aeab..7ebfa03fd0 100644
--- a/easybuild/easyconfigs/p/PRRTE/PRRTE-4.0.0-GCCcore-14.3.0.eb
+++ b/easybuild/easyconfigs/p/PRRTE/PRRTE-3.0.8-rocm-compilers-19.0.0-ROCm-6.4.1.eb
@@ -1,27 +1,27 @@
+# Author:   Aayush Joglekar <aayush.joglekar@surf.nl>
+
 easyblock = 'ConfigureMake'
 
 name = 'PRRTE'
-version = '4.0.0'
+version = '3.0.8'
 
 homepage = 'https://docs.prrte.org/'
 description = """PRRTE is the PMIx Reference RunTime Environment"""
 
-toolchain = {'name': 'GCCcore', 'version': '14.3.0'}
+toolchain = {'name': 'rocm-compilers', 'version': '19.0.0-ROCm-6.4.1'}
 toolchainopts = {'pic': True}
 
 source_urls = ['https://github.com/openpmix/prrte/releases/download/v%(version)s']
 sources = ['%(namelower)s-%(version)s.tar.bz2']
-checksums = ['3c2ec961e0ba0c99128c7bf3545f4789d55a85a70ce958e868ae5e3db6ed4de4']
+checksums = ['e798192fa0ab38172818a109a6c89bcc37e4b1123ca150d8c115dee5231750de']
 
-builddependencies = [
-    ('binutils', '2.44'),
-    ('pkgconf', '2.4.3'),
-]
+builddependencies = [('binutils', '2.42')]
 
 dependencies = [
     ('libevent', '2.1.12'),
-    ('hwloc', '2.12.1'),
-    ('PMIx', '6.0.0'),
+    ('hwloc-ROCm', '2.11.2'),
+    # also picks up rocm-compilers version
+    ('PMIx', '5.0.6'),
 ]
 
 configopts = ' --with-libevent=$EBROOTLIBEVENT'

@zerefwayne zerefwayne force-pushed the rocm-mpi branch 5 times, most recently from 1a06ebe to d46e69f Compare May 3, 2026 07:43
@zerefwayne
Copy link
Copy Markdown
Contributor Author

zerefwayne commented May 3, 2026

hwloc-2.11.2-rocm-compilers-19.0.0-ROCm-6.4.1.eb

Test report by @zerefwayne
SUCCESS
Build succeeded for 1 out of 1 (total: 1 min 59 secs) (1 easyconfigs in total)
nid005670 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), 4 x AMD AMD INSTINCT MI200 (MCM) OAM LC MBA HPE C2 (model: 0x7408, driver: 6.10.5), Python 3.13.4
See https://gist.github.com/zerefwayne/e2cd6dcfcf0fcc76d434cdc183c41112 for a full test report.

@github-actions github-actions Bot added new 2025a issues & PRs related to 2025a common toolchains labels May 3, 2026
@github-actions github-actions Bot removed the 2025a issues & PRs related to 2025a common toolchains label May 3, 2026
@zerefwayne
Copy link
Copy Markdown
Contributor Author

zerefwayne commented May 3, 2026

UCX-ROCm-1.18.0-rocm-compilers-19.0.0-ROCm-6.4.1.eb

Test report by @zerefwayne
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#4126
SUCCESS
Build succeeded for 1 out of 1 (total: 1 min 38 secs) (1 easyconfigs in total)
nid005670 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), 4 x AMD AMD INSTINCT MI200 (MCM) OAM LC MBA HPE C2 (model: 0x7408, driver: 6.10.5), Python 3.13.4
See https://gist.github.com/zerefwayne/af817571087ce959179ccc5705cbb09f for a full test report.

@zerefwayne
Copy link
Copy Markdown
Contributor Author

zerefwayne commented May 3, 2026

UCC-ROCm-1.3.0-rocm-compilers-19.0.0-ROCm-6.4.1.eb

Test report by @zerefwayne
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#4126
SUCCESS
Build succeeded for 1 out of 1 (total: 2 mins 50 secs) (1 easyconfigs in total)
nid005670 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), 4 x AMD AMD INSTINCT MI200 (MCM) OAM LC MBA HPE C2 (model: 0x7408, driver: 6.10.5), Python 3.13.4
See https://gist.github.com/zerefwayne/a956f6ebc101e30bac1b82a968053a4b for a full test report.

@github-actions github-actions Bot added the change label May 3, 2026
@zerefwayne
Copy link
Copy Markdown
Contributor Author

Test report by @zerefwayne
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#4126
FAILED
Build succeeded for 0 out of 1 (total: 1 min 12 secs) (1 easyconfigs in total)
nid005670 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), 4 x AMD AMD INSTINCT MI200 (MCM) OAM LC MBA HPE C2 (model: 0x7408, driver: 6.10.5), Python 3.13.4
See https://gist.github.com/zerefwayne/a82b1bcadae2fb8a5723f8456ebce9db for a full test report.

@zerefwayne
Copy link
Copy Markdown
Contributor Author

zerefwayne commented May 3, 2026

Test report by @zerefwayne
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#4126
FAILED
Build succeeded for 5 out of 6 (total: 21 mins 8 secs) (6 easyconfigs in total)
nid005670 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), 4 x AMD AMD INSTINCT MI200 (MCM) OAM LC MBA HPE C2 (model: 0x7408, driver: 6.10.5), Python 3.13.4
See https://gist.github.com/zerefwayne/17bf696080b5bd4af3501ace89fa929e for a full test report.


OpenMPI-5.0.7-rocm-compilers-19.0.0-ROCm-6.4.1.eb failed because I used eb installed in LUMI, which doesn't have our easyblock updates yet from easybuilders/easybuild-easyblocks#4119

@zerefwayne
Copy link
Copy Markdown
Contributor Author

zerefwayne commented May 3, 2026

OpenMPI-5.0.7-rocm-compilers-19.0.0-ROCm-6.4.1.eb

Test report by @zerefwayne
SUCCESS
Build succeeded for 1 out of 1 (total: 12 mins 57 secs) (1 easyconfigs in total)
nid005670 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), 4 x AMD AMD INSTINCT MI200 (MCM) OAM LC MBA HPE C2 (model: 0x7408, driver: 6.10.5), Python 3.13.4
See https://gist.github.com/zerefwayne/960143b8d6c7cfff3fce6a50f8af44e8 for a full test report.

@zerefwayne
Copy link
Copy Markdown
Contributor Author

Test report by @zerefwayne
SUCCESS
Build succeeded for 6 out of 6 (total: 28 mins 4 secs) (6 easyconfigs in total)
tcn288.local.snellius.surf.nl - Linux RHEL 9.6, x86_64, AMD EPYC 7H12 64-Core Processor (zen2), Python 3.13.4
See https://gist.github.com/zerefwayne/e6837856b07b7d0e1833d61fd471fac1 for a full test report.

@zerefwayne zerefwayne mentioned this pull request May 4, 2026
1 task
Comment on lines +58 to +65
configopts += '--with-pmix=$EBROOTPMIX '
configopts += '--with-prrte=$EBROOTPRRTE '
configopts += '--with-libevent=$EBROOTLIBEVENT '
configopts += '--with-hwloc=$EBROOTHWLOC '
configopts += '--with-ucx=$EBROOTUCX '
configopts += '--with-ucc=$EBROOTUCC '
configopts += '--with-rocm=$EBROOTHIP '
configopts += '--enable-mpi1-compatibility '
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move configopts to the EasyBlock

@zerefwayne
Copy link
Copy Markdown
Contributor Author

Test report by @zerefwayne
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#4126
FAILED
Build succeeded for 5 out of 6 (total: 24 mins 21 secs) (6 easyconfigs in total)
nid005214 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), Python 3.13.4
See https://gist.github.com/zerefwayne/9aa9bc7a18f9fb4bacee5fe80e8c1bbc for a full test report.

@zerefwayne
Copy link
Copy Markdown
Contributor Author

Test report by @zerefwayne
SUCCESS
Build succeeded for 1 out of 1 (total: 15 mins 9 secs) (1 easyconfigs in total)
nid005214 - Linux SLES 15.6, x86_64, AMD EPYC 7A53 64-Core Processor (zen3), Python 3.13.4
See https://gist.github.com/zerefwayne/ef775883589b5d14a708b29bd061b6ef for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants