I tested the execution of a simple inter-node job between two nodes over our Infiniband network with updates 5, 6 and 7 of Intel MPI v2019 and I found very different results for each release. All tests were carried out with iccifort/2020.1.217 as base of the toolchain.
Characteristics of the testing system
- CPU: 2x Intel(R) Xeon(R) Gold 6126
- Adapter: Mellanox Technologies MT27700 Family [ConnectX-4]
- Operative System: Cent OS 7.7
- Related system libraries: UCX v1.5.1, OFED v4.7-3.2.9
- ICC: v2020.1 (from Easybuild)
- Resource manager: Torque
Steps to reproduce:
- Start a job on two nodes
- Load
impi
mpicc ${EBROOTIMPI}/test/test.c -o test
mpirun ./test
Intel MPI v2019 update 5: works out of the box
$ module load impi/2019.5.281-iccifort-2020.1.217
$ fi_info --version
fi_info: 1.7.2a
libfabric: 1.7.2a
libfabric api: 1.7
$ fi_info | grep provider
provider: verbs;ofi_rxm
provider: verbs;ofi_rxd
provider: verbs
provider: verbs
provider: verbs
$ mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os
Intel MPI v2019 update 6: does NOT work out of the box, but can be fixed
$ module load impi/2019.6.166-iccifort-2020.1.217
$ fi_info --version
fi_info: 1.9.0a1
libfabric: 1.9.0a1-impi
libfabric api: 1.8
$ fi_info | grep provider
provider: mlx
provider: mlx;ofi_rxm
$ mpirun ./test
[1585832682.960816] [node357:302190:0] select.c:406 UCX ERROR no active messages transport to <no debug data>: self/self - Destination is unreachable, rdmacm/sockaddr - no am bcopy, mm/sysv - Destination is unreachable, mm/posix - Destination is unreachable, cma/cma - no am bcopy
Abort(1091471) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(703)........:
MPID_Init(958)...............:
MPIDI_OFI_mpi_init_hook(1382): OFI get address vector map failed
- Solution 1: use
verbs or tcp libfabric providers instead of mlx
$ module load impi/2019.6.166-iccifort-2020.1.217
$ FI_PROVIDER=verbs,tcp mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os
$ module load impi/2019.6.166-iccifort-2020.1.217
$ module load UCX/1.7.0-GCCcore-9.3.0
$ ucx_info
# UCT version=1.7.0 revision
# configured with: --prefix=/user/brussel/101/vsc10122/.local/easybuild/software/UCX/1.7.0-GCCcore-9.3.0 --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --enable-optimizations --enable-cma --enable-mt --with-verbs --without-java --disable-doxygen-doc
$ FI_PROVIDER=mlx mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os
- Solution 3: use external libfabric v1.9.1. Upstream libfabric dropped
mlx with version 1.9.0
$ module load impi/2019.6.166-iccifort-2020.1.217
$ module load libfabric/1.9.1-GCCcore-9.3.0
$ export FI_PROVIDER_PATH=
$ fi_info --version
fi_info: 1.9.1
libfabric: 1.9.1
libfabric api: 1.9
$ mpirun ./test
Hello world: rank 0 of 2 running on node357.hydra.os
Hello world: rank 1 of 2 running on node356.hydra.os
Intel MPI v2019 update 7: does NOT work at all
$ module load impi/2019.7.217-iccifort-2020.1.217
$ fi_info --version
fi_info: 1.10.0a1
libfabric: 1.10.0a1-impi
libfabric api: 1.9
$ fi_info | grep provider
provider: verbs;ofi_rxm
[...]
provider: tcp;ofi_rxm
[...]
provider: verbs
[...]
provider: tcp
[...]
provider: sockets
[...]
$ $ I_MPI_DEBUG=4 I_MPI_HYDRA_DEBUG=on FI_LOG_LEVEL=debug mpirun ./test
[mpiexec@node357.hydra.os] Launch arguments: /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_bstrap_proxy --upstream-host node357.hydra.brussel.vsc --upstream-port 40969 --pgid 0 --launcher ssh --launcher-number 0 --base-path /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
[mpiexec@node357.hydra.os] Launch arguments: /usr/bin/ssh -q -x node356.hydra.brussel.vsc /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_bstrap_proxy --upstream-host node357.hydra.brussel.vsc --upstream-port 40969 --pgid 0 --launcher ssh --launcher-number 0 --base-path /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 1 --node-id 1 --subtree-size 1 /user/brussel/101/vsc10122/.local/easybuild/software/impi/2019.7.217-iccifort-2020.1.217/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@node357.hydra.os] Warning - oversubscription detected: 1 processes will be placed on 0 cores
[proxy:0:1@node356.hydra.os] pmi cmd from fd 4: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:1@node356.hydra.os] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1@node356.hydra.os] pmi cmd from fd 4: cmd=get_maxes
[proxy:0:1@node356.hydra.os] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:1@node356.hydra.os] pmi cmd from fd 4: cmd=get_appnum
[proxy:0:1@node356.hydra.os] PMI response: cmd=appnum appnum=0
[proxy:0:1@node356.hydra.os] pmi cmd from fd 4: cmd=get_my_kvsname
[proxy:0:1@node356.hydra.os] PMI response: cmd=my_kvsname kvsname=kvs_309778_0
[proxy:0:1@node356.hydra.os] pmi cmd from fd 4: cmd=get kvsname=kvs_309778_0 key=PMI_process_mapping
[proxy:0:1@node356.hydra.os] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1@node356.hydra.os] pmi cmd from fd 4: cmd=barrier_in
(the execution does not stop, it just hangs at this point)
The system log of the node shows the following entry
traps: hydra_pmi_proxy[549] trap divide error ip:4436ed sp:7ffed012ef50 error:0 in hydra_pmi_proxy[400000+ab000]
This error with IMPI v2019.7 happens way before initializing libfabric. Therefore, it does not depend on the provider or the version of UCX. It happens all the time.
Update
I tested the execution of a simple inter-node job between two nodes over our Infiniband network with updates 5, 6 and 7 of Intel MPI v2019 and I found very different results for each release. All tests were carried out with
iccifort/2020.1.217as base of the toolchain.Characteristics of the testing system
Steps to reproduce:
impimpicc ${EBROOTIMPI}/test/test.c -o testmpirun ./testIntel MPI v2019 update 5: works out of the box
Intel MPI v2019 update 6: does NOT work out of the box, but can be fixed
verbsortcplibfabric providers instead ofmlxmlx, but for us it only works with UCX v1.7 (available in Easybuild).mlxwith version 1.9.0Intel MPI v2019 update 7: does NOT work at all
(the execution does not stop, it just hangs at this point)
The system log of the node shows the following entry
This error with IMPI v2019.7 happens way before initializing libfabric. Therefore, it does not depend on the provider or the version of UCX. It happens all the time.
Update