Skip to content

LAPACK tests are failing with OpenBLAS-0.3.20 and GCC-11.3.0 #16380

@maxim-masterov

Description

@maxim-masterov

Creating this issue to properly log all the progress.

How it started

It was observed that the VASP6 installation with foss/2022a lead to inaccurate results. After some digging the culprit was found - DGGEV subroutine from LAPACK. To simplify debugging of the problem we isolated LAPACK tests from the official netlib distribution (3.10.1) and started to run them using different combinations of compiler flags and OpenBLAS versions.

What we have

The following tests are performed on AMD EPYC ROME (zen2 architecture):

  • OpenBLAS/0.3.15-GCC-10.3.0 (taken from foss/2021a) results in ~130 failed tests:
[wimr@int1 OUTPUT]$ grep failed foss-2021a-openblas-0.3.15/* | grep -v "error exits"
foss-2021a-openblas-0.3.15/ced.out: CEV:    4 out of  1096 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/ced.out: CVX:   24 out of  5196 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/cgd.out: CGV drivers:      5 out of   1092 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/cgd.out: CGV drivers:      6 out of   1092 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zed.out: ZEV:    8 out of  1100 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zed.out: ZVX:   36 out of  5208 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zgd.out: ZGV drivers:     25 out of   1092 tests failed to pass the threshold
foss-2021a-openblas-0.3.15/zgd.out: ZGV drivers:     22 out of   1092 tests failed to pass the threshold 
  • OpenBLAS/0.3.20-GCC-11.3.0 (taken from foss/2022a) results in ~4.2k failed tests
[wimr@int1 OUTPUT]$ grep failed foss-2022a-openblas-0.3.20/* | grep -v "error exits"
foss-2022a-openblas-0.3.20/ced.out: CEV:   30 out of  1122 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/ced.out: CVX:  194 out of  5366 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGV drivers:    129 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGV drivers:    135 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGS drivers:    123 out of   1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgd.out: CGS drivers:    126 out of   1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG:  119 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG:  115 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG:  117 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/cgg.out: CGG:  116 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGS drivers:    129 out of   1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGS drivers:    129 out of   1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGV drivers:    166 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgd.out: DGV drivers:    171 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG:  143 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG:  146 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG:  163 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/dgg.out: DGG:  150 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGS drivers:    144 out of   1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGS drivers:    132 out of   1560 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGV drivers:    173 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgd.out: SGV drivers:    186 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG:  153 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG:  140 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG:  147 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/sgg.out: SGG:  150 out of  2184 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zed.out: ZEV:   50 out of  1142 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zed.out: ZVX:  296 out of  5468 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zgd.out: ZGV drivers:     52 out of   1092 tests failed to pass the threshold
foss-2022a-openblas-0.3.20/zgd.out: ZGV drivers:     50 out of   1092 tests failed to pass the threshold
  • OpenBLAS/0.3.15-GCC-11.3.0 (new build) results in ~4.2k failed tests

@zao got similar results on Ryzen 9 3900X (zen2 desktop) when built LAPACK tests with full foss/2022a and buildenv that picked up FlexiBLAS as the USE_OPTIMIZED_BLAS implementation:

[easybuild@eb-rocky8 build-lapack-ob-0.3.20-benv]$ grep failed TESTING/testing_results.txt
 SGG:  163 out of  2184 tests failed to pass the threshold
 SGG:  159 out of  2184 tests failed to pass the threshold
 SGG:  162 out of  2184 tests failed to pass the threshold
 SGG:  155 out of  2184 tests failed to pass the threshold
 SGS drivers:    144 out of   1560 tests failed to pass the threshold
 SGS drivers:    159 out of   1560 tests failed to pass the threshold
 SGV drivers:    180 out of   1092 tests failed to pass the threshold
 SGV drivers:    184 out of   1092 tests failed to pass the threshold
  STFSM auxiliary routine:     1 out of  7776 tests failed to pass the threshold
 DGG:  161 out of  2184 tests failed to pass the threshold
 DGG:  150 out of  2184 tests failed to pass the threshold
 DGG:  166 out of  2184 tests failed to pass the threshold
 DGG:  151 out of  2184 tests failed to pass the threshold
 DGS drivers:    135 out of   1560 tests failed to pass the threshold
 DGS drivers:    156 out of   1560 tests failed to pass the threshold
 DGV drivers:    174 out of   1092 tests failed to pass the threshold
 DGV drivers:    172 out of   1092 tests failed to pass the threshold
 CEV:   30 out of  1122 tests failed to pass the threshold
 CVX:  194 out of  5366 tests failed to pass the threshold
 CGG:  122 out of  2184 tests failed to pass the threshold
 CGG:  118 out of  2184 tests failed to pass the threshold
 CGG:  129 out of  2184 tests failed to pass the threshold
 CGG:  121 out of  2184 tests failed to pass the threshold
 CGV drivers:    135 out of   1092 tests failed to pass the threshold
 CGV drivers:    121 out of   1092 tests failed to pass the threshold
 CGS drivers:    126 out of   1560 tests failed to pass the threshold
 CGS drivers:    135 out of   1560 tests failed to pass the threshold
 ZHS:    1 out of  1764 tests failed to pass the threshold
 ZHS:    1 out of  1764 tests failed to pass the threshold
 ZHS:    1 out of  1764 tests failed to pass the threshold
 ZHS:    1 out of  1764 tests failed to pass the threshold
 ZEV:   50 out of  1142 tests failed to pass the threshold
 ZVX:  296 out of  5468 tests failed to pass the threshold
 ZGV drivers:     54 out of   1092 tests failed to pass the threshold
 ZGV drivers:     39 out of   1092 tests failed to pass the threshold

The main question - are failing tests caused by FlexiBLAS or by the optimization flags?

Update 1

From @zao :
Stripping -ftree-vectorize from the build flags that buildenv sets (leaving -O2 -march=native) makes it behave, so it's probably the better vectorizer in GCC11 lifting up some latent problem in OpenBLAS. It wouldn't be the first time...

STFSM auxiliary routine:     1 out of  7776 tests failed to pass the threshold
 CEV:    4 out of  1096 tests failed to pass the threshold
 CVX:   24 out of  5196 tests failed to pass the threshold
 CGV drivers:      5 out of   1092 tests failed to pass the threshold
 CGV drivers:      5 out of   1092 tests failed to pass the threshold
 ZEV:    8 out of  1100 tests failed to pass the threshold
 ZVX:   36 out of  5208 tests failed to pass the threshold
 ZGV drivers:     26 out of   1092 tests failed to pass the threshold
 ZGV drivers:     26 out of   1092 tests failed to pass the threshold

Update 2

From @zao
I've set up a fresh environment on a Haswell machine, got the same grade of broken outcome as on our zen2 so not µarch-dependent. Steps:

Make and install a buildenv-default-GCC-11.3.0.eb
$ ml GCC/11.3.0 OpenBLAS/0.3.20 CMake/3.23.1
$ ml buildenv  # defines all the various flags variables to "-O2 -ftree-vectorize -march=native"
$ tar xf v3.10.1.tar.gz  # extract LAPACK sources
$ cmake -B build-tests lapack-3.10.1/ -DUSE_OPTIMIZED_BLAS=ON -DBUILD_TESTING=ON -DBLAS_LIBRARIES=$EBROOTOPENBLAS/lib/libopenblas.so
$ cmake --build build-tests -j 4 && cmake --build build-tests -t test
$ (cd lapack-3.10.1; ./lapack_testing.py; grep failed TESTING/testing_results.txt)

Update 3

From @zao

Ran some exhaustive tests on zen2 from GCC 9.5.0 through GCC 12.2.0 with OpenBLAS 0.3.20. It's not looking great.
I'll try to provide data later but it seems that starting with GCC 12 we get elevated test error rates even without -ftree-vectorize , but builds with the flag have fewer categories of test errors comparatively than GCC 11 does.
Interesting enough, even on the 9.5 and 10 series there's slightly different error counts if you look at with/without the flag. I don't know enough about this test suite to tell whether any errors at all is a problem.

Update 4

I got the following number of numerical errors using lapack_testing.py -p x -t eig from the LAPACK distribution:

Build with GCC-11.3:

  • -O2 -march=znver2 -funroll-all-loops -fno-math-errno -ftree-vectorize : 4090
  • -O2 -march=znver2 -funroll-all-loops -fno-math-errno : 136
  • -O2 -march=znver2 -fno-math-errno : 136
  • -O2 -fno-math-errno : 7

Build with GCC-10.3:

  • -O2 -march=znver2 -funroll-all-loops -fno-math-errno -ftree-vectorize : 136

every OpebBLAS version was built manually using GCC/11.3.0 or GCC/10.3.0 module (no FlexiBLAS involved)

Way to reproduce:

$ wget https://github.com/Reference-LAPACK/lapack/archive/refs/tags/v3.10.1.tar.gz
$ tar -xf v3.10.1.tar.gz
$ cd lapack-3.10.1
$ cp make.inc.example make.inc
$
$ # Modify make.inc by removing paths to BLASLIB, CBLASLIB, TMGLIB and LAPACKELIB
$ # Change LAPACKLIB to, e.g. $(EBROOTOPENBLAS)/lib/libopenblas.so
$
$ cd TESTING
$ make 
$ cd .. 
$ lapack_testing.py -p x -t eig

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions