Skip to content

{lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6#11629

Merged
lexming merged 6 commits intoeasybuilders:developfrom
boegel:20201108102926_new_pr_SciPy-bundle202011
Nov 28, 2020
Merged

{lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6#11629
lexming merged 6 commits intoeasybuilders:developfrom
boegel:20201108102926_new_pr_SciPy-bundle202011

Conversation

@boegel
Copy link
Copy Markdown
Member

@boegel boegel commented Nov 8, 2020

(created using eb --new-pr)
requires #11337 (intel/2020b), #11489 (foss/2020b)

note: marked as WIP since the versionsuffix should be removed (and the tests should be changed accordingly)

…SciPy-bundle-2020.11-intel-2020b-Python-3.8.6.eb
@boegel boegel added the update label Nov 8, 2020
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 8, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3108.skitty.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/c4cbdde126de8ecac55c4987b319f310 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 8, 2020

Test report by @boegel
SUCCESS
Build succeeded for 15 out of 15 (2 easyconfigs in total)
node2406.golett.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/7418d361621ab3dfa92cb89a3718dc71 for a full test report.

@boegel boegel changed the title {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 (WIP) Nov 9, 2020
@easybuilders easybuilders deleted a comment from boegelbot Nov 9, 2020
@boegel boegel added the 2020b issues & PRs related to 2020b label Nov 9, 2020
@boegel boegel added this to the 4.3.2 milestone Nov 9, 2020
@boegel boegel changed the title {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 (WIP) {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 Nov 9, 2020
@easybuilders easybuilders deleted a comment from boegelbot Nov 9, 2020
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 9, 2020

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
easybuild1.novalocal - Linux centos linux 8.2.2004, POWER, IBM pSeries (emulated by qemu) (power8le), Python 3.6.8
See https://gist.github.com/63a2955015aa5b1cc2c6621250d33604 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 9, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11629 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11629 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9724

Test results coming soon (I hope)...

Details

- notification for comment with ID 724282509 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 9, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3568.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/f7a4b7f0618fb5510c5986547cf2d51b for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/2224cfc86d9db434b2fcfbbc48602a4f for a full test report.

@boegel boegel requested a review from Micket November 10, 2020 08:10
Comment thread easybuild/easyconfigs/s/SciPy-bundle/SciPy-bundle-2020.11-foss-2020b.eb Outdated
Micket
Micket previously approved these changes Nov 12, 2020
Copy link
Copy Markdown
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make a mental note to include hypothesis into Python in 2021a.

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 12, 2020

I just now saw that we already have a stand-alone hypothesis easyconfig, used in PyTorch and some other thing.
(somehow this is just a builddep in PyTorch, which seems a bit odd, hypothesis isn't a build-related package it is?)
But, since we are probably going to have sucha module for 2020a and 2020b as well.. should we use it here?

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 12, 2020

I just now saw that we already have a stand-alone hypothesis easyconfig, used in PyTorch and some other thing.
(somehow this is just a builddep in PyTorch, which seems a bit odd, hypothesis isn't a build-related package it is?)
But, since we are probably going to have sucha module for 2020a and 2020b as well.. should we use it here?

hypothesis is a testing library, and it's actually only a build dep for numpy too, see https://github.com/numpy/numpy/tree/master/INSTALL.rst.txt#prerequisites .

So it makes total sense to make it a separate easyconfig and only include it as a build dep, I'll look into changing that 👍

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 12, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11629 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11629 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9763

Test results coming soon (I hope)...

Details

- notification for comment with ID 726113140 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 12, 2020

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3401.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/5dc4e7b21c9a9b1c34fdfb52da36c175 for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/bd919ac550e9a0e335b494f52f2ef052 for a full test report.

@boegel boegel force-pushed the 20201108102926_new_pr_SciPy-bundle202011 branch from 8df1b02 to 4367b20 Compare November 16, 2020 15:02
@easybuilders easybuilders deleted a comment from boegelbot Nov 16, 2020
@easybuilders easybuilders deleted a comment from boegelbot Nov 16, 2020
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 16, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11629 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11629 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9772

Test results coming soon (I hope)...

Details

- notification for comment with ID 728118363 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/b178deeb993f3279c3d17dba1a445212 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 16, 2020

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node2304.phanpy.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/53474dbcde81a6ef12edb20ab43b105c for a full test report.

Copy link
Copy Markdown
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 16, 2020

Test report by @Micket
FAILED
Build succeeded for 7 out of 13 (3 easyconfigs in total)
alvis-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/ef9794eae3ea0619d9a2d78cbd4c1ed6 for a full test report.

@branfosj
Copy link
Copy Markdown
Member

Test report by @branfosj
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
bear-pg0306u19a.bear.cluster - Linux RHEL 8.2, POWER, 8335-GTX (power9le), Python 3.6.8
See https://gist.github.com/f3ba48b399e418a5cbdf132f196daf03 for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 16, 2020

@Micket Any idea what the problem is here in your failing test report?

Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1149)..............: 
MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed
[1605544318.626885] [alvis-c1:196032:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 16, 2020

I forgot to set UCX_TLS for the VM i'm building on

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 16, 2020

Test report by @Micket
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/4db0242f4dbf0e230780c61045147caf for a full test report.

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 16, 2020

Test report by @Micket
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/c14f65d4ee6670c909f5801a0bcefae7 for a full test report.

Copy link
Copy Markdown
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lexming
Copy link
Copy Markdown
Contributor

lexming commented Nov 19, 2020

Test report by @lexming
SUCCESS
Build succeeded for 7 out of 7 (3 easyconfigs in total)
node381.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/3e39596d2852c49762bdf001a418d2f7 for a full test report.

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 20, 2020

I can't make sense of my build errors;

== 2020-11-20 11:01:20,166 run.py:222 INFO running cmd: python -c "import numexpr"
== 2020-11-20 11:01:20,411 extensioneasyblock.py:181 INFO Sanity check for numexpr successful!
== 2020-11-20 11:01:20,412 easyblock.py:2669 WARNING failing sanity check for 'numexpr' extension: (see log for details)

what?!

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 20, 2020

Test report by @Micket
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/904c3ae9c7d99eb1245aeab6051e0ffb for a full test report.

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 23, 2020

@Micket There must be an error higher up? Are you sure you're using the numexpr easyblock from develop?
See easybuilders/easybuild-easyblocks#2022.

@schiotz
Copy link
Copy Markdown
Contributor

schiotz commented Nov 27, 2020

@Micket Any idea what the problem is here in your failing test report?

Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1149)..............: 
MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed
[1605544318.626885] [alvis-c1:196032:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

We are seing this error in some of our own code with intel/2020a. Apparently, calling MPI_Init from a program that is not started with mpiexec / mpirun cause this error. According to the MPI standard, an MPI implementation is strongly encouraged (but not required) to allow this situation, and just initialize an MPI environment with rank=1, as if called with mpiexec -n 1. But newer versions of Intel MPI apparently fail this. I found a bug report somewhere where Intel claims to have fixed it in 2019 update 6, but that does not seem to be the case.

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 27, 2020

@schiotz I was just building on a VM that doesn't have infiniband. I had to set UCX_TLS.

I'm rebuilding now with the easyblock from pr 2022

@schiotz
Copy link
Copy Markdown
Contributor

schiotz commented Nov 27, 2020

@Micket Is that an issue? Running on machines without infiniband if MPI is installed to support Infiniband? In that case, what should UCX_TLS be set to?

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 27, 2020

Yes, seems to be. Intel MPI will try to use a transport that isn't supported. In fact, it's even an issue on older IB hardware, cf.
#10899
easybuilders/easybuild-easyblocks#2253

On my build machine (that completely lacks IB) I set UCX_TLS=self,tcp which seems to do the trick.

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Nov 27, 2020

Test report by @Micket
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2022
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/c1837efe1bb256c7b2900f48c273fadd for a full test report.

@lexming
Copy link
Copy Markdown
Contributor

lexming commented Nov 28, 2020

Going in, thanks @boegel !

@lexming lexming merged commit b823998 into easybuilders:develop Nov 28, 2020
@boegel boegel deleted the 20201108102926_new_pr_SciPy-bundle202011 branch November 28, 2020 16:39
@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 28, 2020

@Micket Any idea what the problem is here in your failing test report?

Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1149)..............: 
MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed
[1605544318.626885] [alvis-c1:196032:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

We are seing this error in some of our own code with intel/2020a. Apparently, calling MPI_Init from a program that is not started with mpiexec / mpirun cause this error. According to the MPI standard, an MPI implementation is strongly encouraged (but not required) to allow this situation, and just initialize an MPI environment with rank=1, as if called with mpiexec -n 1. But newer versions of Intel MPI apparently fail this. I found a bug report somewhere where Intel claims to have fixed it in 2019 update 6, but that does not seem to be the case.

I can confirm this problem, it's very annoying, but it's only an issue with the impi in intel/2020a (it doesn't happen with intel/2020b, I think).

@lexming
Copy link
Copy Markdown
Contributor

lexming commented Nov 28, 2020

@boegel that is the error without easybuilders/easybuild-easyblocks#2253

@boegel
Copy link
Copy Markdown
Member Author

boegel commented Nov 28, 2020

@boegel that is the error without easybuilders/easybuild-easyblocks#2253

I thought that fix was only relevant for multi-node runs? Perhaps not, ok :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2020b issues & PRs related to 2020b update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants