Creating this issue to properly log all the progress.
How it started
It was observed that the VASP6 installation with foss/2022a lead to inaccurate results. After some digging the culprit was found - DGGEV subroutine from LAPACK. To simplify debugging of the problem we isolated LAPACK tests from the official netlib distribution (3.10.1) and started to run them using different combinations of compiler flags and OpenBLAS versions.
What we have
The following tests are performed on AMD EPYC ROME (zen2 architecture):
OpenBLAS/0.3.15-GCC-10.3.0 (taken from foss/2021a) results in ~130 failed tests:
[wimr@int1 OUTPUT]$ grep failed foss-2021a-openblas-0.3.15/* | grep -v "error exits"
foss-2021a-openblas-0.3.15/ced.out: CEV: 4 out of 1096 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/ced.out: CVX: 24 out of 5196 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/cgd.out: CGV drivers: 5 out of 1092 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/cgd.out: CGV drivers: 6 out of 1092 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zed.out: ZEV: 8 out of 1100 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zed.out: ZVX: 36 out of 5208 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zgd.out: ZGV drivers: 22 out of 1092 tests failed to pass the threshold
OpenBLAS/0.3.20-GCC-11.3.0 (taken from foss/2022a) results in ~4.2k failed tests
[wimr@int1 OUTPUT]$ grep failed foss-2022a-openblas-0.3.20/* | grep -v "error exits"
foss-2022a-openblas-0.3.20/ced.out: CEV: 30 out of 1122 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/ced.out: CVX: 194 out of 5366 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGV drivers: 129 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGV drivers: 135 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGS drivers: 123 out of 1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGS drivers: 126 out of 1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG: 119 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG: 115 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG: 117 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG: 116 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGS drivers: 129 out of 1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGS drivers: 129 out of 1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGV drivers: 166 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGV drivers: 171 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG: 143 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG: 146 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG: 163 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG: 150 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGS drivers: 144 out of 1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGS drivers: 132 out of 1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGV drivers: 173 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGV drivers: 186 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG: 153 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG: 140 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG: 147 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG: 150 out of 2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zed.out: ZEV: 50 out of 1142 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zed.out: ZVX: 296 out of 5468 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zgd.out: ZGV drivers: 52 out of 1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zgd.out: ZGV drivers: 50 out of 1092 tests failed to pass the threshold
OpenBLAS/0.3.15-GCC-11.3.0 (new build) results in ~4.2k failed tests
@zao got similar results on Ryzen 9 3900X (zen2 desktop) when built LAPACK tests with full foss/2022a and buildenv that picked up FlexiBLAS as the USE_OPTIMIZED_BLAS implementation:
[easybuild@eb-rocky8 build-lapack-ob-0.3.20-benv]$ grep failed TESTING/testing_results.txt
SGG: 163 out of 2184 tests failed to pass the threshold
SGG: 159 out of 2184 tests failed to pass the threshold
SGG: 162 out of 2184 tests failed to pass the threshold
SGG: 155 out of 2184 tests failed to pass the threshold
SGS drivers: 144 out of 1560 tests failed to pass the threshold
SGS drivers: 159 out of 1560 tests failed to pass the threshold
SGV drivers: 180 out of 1092 tests failed to pass the threshold
SGV drivers: 184 out of 1092 tests failed to pass the threshold
STFSM auxiliary routine: 1 out of 7776 tests failed to pass the threshold
DGG: 161 out of 2184 tests failed to pass the threshold
DGG: 150 out of 2184 tests failed to pass the threshold
DGG: 166 out of 2184 tests failed to pass the threshold
DGG: 151 out of 2184 tests failed to pass the threshold
DGS drivers: 135 out of 1560 tests failed to pass the threshold
DGS drivers: 156 out of 1560 tests failed to pass the threshold
DGV drivers: 174 out of 1092 tests failed to pass the threshold
DGV drivers: 172 out of 1092 tests failed to pass the threshold
CEV: 30 out of 1122 tests failed to pass the threshold
CVX: 194 out of 5366 tests failed to pass the threshold
CGG: 122 out of 2184 tests failed to pass the threshold
CGG: 118 out of 2184 tests failed to pass the threshold
CGG: 129 out of 2184 tests failed to pass the threshold
CGG: 121 out of 2184 tests failed to pass the threshold
CGV drivers: 135 out of 1092 tests failed to pass the threshold
CGV drivers: 121 out of 1092 tests failed to pass the threshold
CGS drivers: 126 out of 1560 tests failed to pass the threshold
CGS drivers: 135 out of 1560 tests failed to pass the threshold
ZHS: 1 out of 1764 tests failed to pass the threshold
ZHS: 1 out of 1764 tests failed to pass the threshold
ZHS: 1 out of 1764 tests failed to pass the threshold
ZHS: 1 out of 1764 tests failed to pass the threshold
ZEV: 50 out of 1142 tests failed to pass the threshold
ZVX: 296 out of 5468 tests failed to pass the threshold
ZGV drivers: 54 out of 1092 tests failed to pass the threshold
ZGV drivers: 39 out of 1092 tests failed to pass the threshold
The main question - are failing tests caused by FlexiBLAS or by the optimization flags?
Update 1
From @zao :
Stripping -ftree-vectorize from the build flags that buildenv sets (leaving -O2 -march=native) makes it behave, so it's probably the better vectorizer in GCC11 lifting up some latent problem in OpenBLAS. It wouldn't be the first time...
STFSM auxiliary routine: 1 out of 7776 tests failed to pass the threshold
CEV: 4 out of 1096 tests failed to pass the threshold
CVX: 24 out of 5196 tests failed to pass the threshold
CGV drivers: 5 out of 1092 tests failed to pass the threshold
CGV drivers: 5 out of 1092 tests failed to pass the threshold
ZEV: 8 out of 1100 tests failed to pass the threshold
ZVX: 36 out of 5208 tests failed to pass the threshold
ZGV drivers: 26 out of 1092 tests failed to pass the threshold
ZGV drivers: 26 out of 1092 tests failed to pass the threshold
Update 2
From @zao
I've set up a fresh environment on a Haswell machine, got the same grade of broken outcome as on our zen2 so not µarch-dependent. Steps:
Make and install a buildenv-default-GCC-11.3.0.eb
$ ml GCC/11.3.0 OpenBLAS/0.3.20 CMake/3.23.1
$ ml buildenv # defines all the various flags variables to "-O2 -ftree-vectorize -march=native"
$ tar xf v3.10.1.tar.gz # extract LAPACK sources
$ cmake -B build-tests lapack-3.10.1/ -DUSE_OPTIMIZED_BLAS=ON -DBUILD_TESTING=ON -DBLAS_LIBRARIES=$EBROOTOPENBLAS/lib/libopenblas.so
$ cmake --build build-tests -j 4 && cmake --build build-tests -t test
$ (cd lapack-3.10.1; ./lapack_testing.py; grep failed TESTING/testing_results.txt)
Update 3
From @zao
Ran some exhaustive tests on zen2 from GCC 9.5.0 through GCC 12.2.0 with OpenBLAS 0.3.20. It's not looking great.
I'll try to provide data later but it seems that starting with GCC 12 we get elevated test error rates even without -ftree-vectorize , but builds with the flag have fewer categories of test errors comparatively than GCC 11 does.
Interesting enough, even on the 9.5 and 10 series there's slightly different error counts if you look at with/without the flag. I don't know enough about this test suite to tell whether any errors at all is a problem.
Update 4
I got the following number of numerical errors using lapack_testing.py -p x -t eig from the LAPACK distribution:
Build with GCC-11.3:
-O2 -march=znver2 -funroll-all-loops -fno-math-errno -ftree-vectorize : 4090
-O2 -march=znver2 -funroll-all-loops -fno-math-errno : 136
-O2 -march=znver2 -fno-math-errno : 136
-O2 -fno-math-errno : 7
Build with GCC-10.3:
-O2 -march=znver2 -funroll-all-loops -fno-math-errno -ftree-vectorize : 136
every OpebBLAS version was built manually using GCC/11.3.0 or GCC/10.3.0 module (no FlexiBLAS involved)
Way to reproduce:
$ wget https://github.com/Reference-LAPACK/lapack/archive/refs/tags/v3.10.1.tar.gz
$ tar -xf v3.10.1.tar.gz
$ cd lapack-3.10.1
$ cp make.inc.example make.inc
$
$ # Modify make.inc by removing paths to BLASLIB, CBLASLIB, TMGLIB and LAPACKELIB
$ # Change LAPACKLIB to, e.g. $(EBROOTOPENBLAS)/lib/libopenblas.so
$
$ cd TESTING
$ make
$ cd ..
$ lapack_testing.py -p x -t eig
Creating this issue to properly log all the progress.
How it started
It was observed that the VASP6 installation with
foss/2022alead to inaccurate results. After some digging the culprit was found -DGGEVsubroutine fromLAPACK. To simplify debugging of the problem we isolated LAPACK tests from the official netlib distribution (3.10.1) and started to run them using different combinations of compiler flags and OpenBLAS versions.What we have
The following tests are performed on AMD EPYC ROME (zen2 architecture):
OpenBLAS/0.3.15-GCC-10.3.0(taken fromfoss/2021a) results in ~130 failed tests:OpenBLAS/0.3.20-GCC-11.3.0(taken fromfoss/2022a) results in ~4.2k failed testsOpenBLAS/0.3.15-GCC-11.3.0(new build) results in ~4.2k failed tests@zao got similar results on Ryzen 9 3900X (zen2 desktop) when built LAPACK tests with full
foss/2022aandbuildenvthat picked upFlexiBLASas theUSE_OPTIMIZED_BLASimplementation:The main question - are failing tests caused by FlexiBLAS or by the optimization flags?
Update 1
From @zao :
Stripping -ftree-vectorize from the build flags that buildenv sets (leaving -O2 -march=native) makes it behave, so it's probably the better vectorizer in GCC11 lifting up some latent problem in OpenBLAS. It wouldn't be the first time...
Update 2
From @zao
I've set up a fresh environment on a Haswell machine, got the same grade of broken outcome as on our zen2 so not µarch-dependent. Steps:
Update 3
From @zao
Update 4
I got the following number of numerical errors using
lapack_testing.py -p x -t eigfrom the LAPACK distribution:Build with
GCC-11.3:-O2 -march=znver2 -funroll-all-loops -fno-math-errno -ftree-vectorize: 4090-O2 -march=znver2 -funroll-all-loops -fno-math-errno: 136-O2 -march=znver2 -fno-math-errno: 136-O2 -fno-math-errno: 7Build with
GCC-10.3:-O2 -march=znver2 -funroll-all-loops -fno-math-errno -ftree-vectorize: 136every OpebBLAS version was built manually using
GCC/11.3.0orGCC/10.3.0module (no FlexiBLAS involved)Way to reproduce: