Skip to content

{mpi}[GCCcore/10.3.0] OpenMPI v4.1.1#12566

Merged
boegel merged 10 commits intoeasybuilders:developfrom
robert-mijakovic:20210412092552_new_pr_OpenMPI411rc2
May 11, 2021
Merged

{mpi}[GCCcore/10.3.0] OpenMPI v4.1.1#12566
boegel merged 10 commits intoeasybuilders:developfrom
robert-mijakovic:20210412092552_new_pr_OpenMPI411rc2

Conversation

@robert-mijakovic
Copy link
Copy Markdown
Contributor

@robert-mijakovic robert-mijakovic commented Apr 12, 2021

(created using eb --new-pr)
Depends on #12521 (GCC, binutils), #12639 (libpciaccess), #12604 (pkg-config)

@boegelbot

This comment has been minimized.

Comment thread easybuild/easyconfigs/p/Perl/Perl-5.32.1-GCCcore-10.3.0.eb
Micket
Micket previously requested changes Apr 13, 2021
Comment thread easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.1rc2-GCC-10.3.0.eb Outdated
Comment thread easybuild/easyconfigs/n/numactl/numactl-2.0.14-GCCcore-10.3.0.eb
@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 24, 2021

@robert-mijakovic Final OpenMPI v4.1.1 release is out now, see https://www.open-mpi.org/software/ompi/v4.1/

@boegel boegel changed the title {WIP} OpenMPI 4.1.1 {mpi}[GCCcore/10.3.0] OpenMPI v4.1.1 Apr 24, 2021
Comment thread easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.1rc2-GCC-10.3.0.eb Outdated
Comment thread easybuild/easyconfigs/u/UCX/UCX-1.10.0-GCCcore-10.3.0.eb Outdated
Comment thread easybuild/easyconfigs/o/OpenMPI/OpenMPI-4.1.1rc2-GCC-10.3.0.eb Outdated
@easybuilders easybuilders deleted a comment from boegelbot Apr 25, 2021
@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 25, 2021

@boegelbot please test @ generoso
CORE_COUNT=16

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=12566 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_12566 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 16916

Test results coming soon (I hope)...

Details

- notification for comment with ID 826294077 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/1e93ab7f641377478d4365a5c4a30db6 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 25, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3512.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/bc936e191da9f3f3ef9c67414ff52ba2 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 25, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3169.skitty.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/dc31d657517cb17792bab4eb88720f10 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 25, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node2690.swalot.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/83f83eac7de2312407a08ab98deb94e7 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 25, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
select-pika-c6gd-2xlarge-0001 - Linux centos linux 8.3.2011, AArch64, ARM UNKNOWN (graviton2), Python 3.6.8
See https://gist.github.com/43349d1c89c2e5ba13f00c299c86a861 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 25, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
easybuild2.novalocal - Linux centos linux 8.3.2011, POWER, IBM pSeries (emulated by qemu) (power9le), Python 3.6.8
See https://gist.github.com/f62a22d55621b189b267b12c4a00c837 for a full test report.

@branfosj
Copy link
Copy Markdown
Member

Running various OSU microbenchmarks (5.7) against OpenMPI 4.1.1 I'm getting:

--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_endpoint).  This is highly
unusual; your job may behave unpredictably (and/or abort) after this.  Local host: bear-pg0211u17a
  Location: mtl_ofi_component.c:513
  Error: Invalid argument (22)
--------------------------------------------------------------------------

I've filed ofiwg/libfabric#6710

Building libfabric with --disable-psm3 removes the issue.

@branfosj
Copy link
Copy Markdown
Member

For libfabric:

  • The bug I am seeing suggests we should add --disable-psm3 for this version and look at enabling psm3 support in a future version when we have a clearer understanding of the problem
  • OpenMPI-4.0.3-GCC-9.3.0 compilation error on rhel 7 #11939 is more complicated. Based on my testing we'd need a conditional OS dep - if you have libnl v1 then you'll likely also need libnl v3 as well

@branfosj
Copy link
Copy Markdown
Member

branfosj commented May 4, 2021

Running various OSU microbenchmarks (5.7) against OpenMPI 4.1.1 I'm getting:

--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_endpoint).  This is highly
unusual; your job may behave unpredictably (and/or abort) after this.  Local host: bear-pg0211u17a
  Location: mtl_ofi_component.c:513
  Error: Invalid argument (22)
--------------------------------------------------------------------------

I've filed ofiwg/libfabric#6710

Building libfabric with --disable-psm3 removes the issue.

@robert-mijakovic I've put the libfabric patch from upstream in robert-mijakovic#1

add libfabric psm3 init patch from upstream
@branfosj
Copy link
Copy Markdown
Member

branfosj commented May 6, 2021

Test report by @branfosj
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
bear-pg0211u03a.bear.cluster - Linux RHEL 8.3, x86_64, Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (cascadelake), Python 3.6.8
See https://gist.github.com/07ac010b2c0a28a958297199d6814223 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented May 11, 2021

@boegelbot please test @ generoso

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=12566 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_12566 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 17080

Test results coming soon (I hope)...

Details

- notification for comment with ID 839047074 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Copy Markdown
Member

boegel commented May 11, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3125.skitty.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/bdf89b63e20f7e0ff45812dca9eba697 for a full test report.

@boegel boegel dismissed Micket’s stale review May 11, 2021 19:47

requested changes made

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/292eb0f51a61a1c0643ce236c89d966d for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented May 11, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
node3503.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/cf169ac0940a818be861f08b41ac070d for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented May 11, 2021

Going in, thanks @robert-mijakovic!

@boegel boegel merged commit 03aa9f9 into easybuilders:develop May 11, 2021
@boegel
Copy link
Copy Markdown
Member

boegel commented May 11, 2021

Test report by @boegel
SUCCESS
Build succeeded for 6 out of 6 (6 easyconfigs in total)
select-pika-c6gd-2xlarge-0001 - Linux centos linux 8.3.2011, AArch64, ARM UNKNOWN (graviton2), Python 3.6.8
See https://gist.github.com/bf24faaaac40925ede14285a359e07f6 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants