{lib}[GCCcore/11.2.0,foss/2021b] CuPy v11.4.0, pytest v7.2.2 w/ Python 3.9.6#17526
Conversation
….2-GCCcore-11.2.0.eb
|
Test report by @Micket |
|
Test report by @Micket |
|
Test report by @akesandgren |
|
Test report by @Micket |
|
Test report by @Micket |
|
Damnit, filled up home dir quota with |
|
Test report by @Micket |
|
uh, i have no idea how the pytest test suite failed on the V100 nodes. it worked before. |
| ('cuDNN', '8.2.2.26', versionsuffix, SYSTEM), | ||
| ('NCCL', '2.10.3', versionsuffix), | ||
| ('cuTENSOR', '1.6.1.5', versionsuffix, SYSTEM), | ||
| # ('cuSPARSELt', '0.3.0.3', versionsuffix, SYSTEM), |
There was a problem hiding this comment.
Why is this commented? If it's really not needed, can't it just be removed?
There was a problem hiding this comment.
It was an optional thing, but i had problems getting that to work in my PR. I've forgotten what the reason was, only that i didn't care enough about that optional feature.
There was a problem hiding this comment.
It currently crashes/fails to compile some part, I left in in place to be enabled for a newer toolchain
There was a problem hiding this comment.
Should we put that as a comment? I.e. that it's an optional dep, commented out because it gave issues, but could be re-enabled for newer toolchain?
|
Test report by @casparvl |
|
Test report by @Micket |
|
Test report by @Micket |
|
You can ignore my test, I completely overlooked the dependency on the EasyBlock. I'll re-test... |
|
Test report by @casparvl |
|
Same result, unfortunately... |
|
@casparvl you have an odd test failure in pytest already, can you look closer at that? |
|
@akesandgren: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/4423597098 bleep, bloop, I'm just a bot (boegelbot v20200716.01) |
…ore according to our rules.
|
A40's are failing cupy test |
|
Test report by @Micket |
|
You forgot to include the easyblock, CuPy uses "testinstall" which is fixed in that easyblock |
|
Ah, my bad, will retest. Thanks for the pointer. |
|
I was investigating the TestProduct.test_tensordot failure for the A40. @testing.for_all_dtypes()
@testing.numpy_cupy_allclose()
def test_tensordot(self, xp, dtype):
a = testing.shaped_arange((2, 3, 4), xp, dtype)
b = testing.shaped_arange((3, 4, 5), xp, dtype)
return xp.tensordot(a, b)I can generate the test data manually So, with float16 precision on a T4, it does compute the expected output >>> np.tensordot(a_np, b_np)
array([[2938., 3016., 3094., 3172., 3250.],
[7040., 7264., 7488., 7708., 7928.]], dtype=float16)
>>> cp.tensordot(a, b)
array([[2938., 3016., 3094., 3172., 3250.],
[7040., 7264., 7488., 7708., 7928.]], dtype=float16)Since it wasn't clear from the actual/desired output, i can confirm that my icelake CPUs also produce the same output using numpy. actual = array([[2938., 3016., 3094., 3172., 3250.],
[7044., 7264., 7484., 7708., 7932.]], dtype=float16) # wrong results on A40float16's have enough poor precision that these are actually rounding errors. With higher precision, >>> np.tensordot(np.array(a, dtype=np.float64), np.array(b, dtype=np.float64))
array([[2938., 3016., 3094., 3172., 3250.],
[7042., 7264., 7486., 7708., 7930.]])So these numbers aren't really any worse in terms of floating point accuracy. Annoying that isn't 1 to 1 with numpy, but i think we just have to ignore this test. |
|
Test report by @casparvl |
|
Ok, finally one build that completes. But only when using local scratch disks. Also, I don't see the same test failures that @Micket see's on the A40 (we have A100s). From his comment above, I interpret it as a flaky test though. How do you both think we should proceed on this? I see two options:
And the one from @Micket from
What do you think? |
|
Test report by @Micket |
The values in the range 4096 - 8192 have a precision interval of 4 so the test is incorrectly written, rtol should probably be {'float16': 1e-3} that would handle both ranges. Conclusion, test is wrong and/or very badly choosen. |
|
@akesandgren Yep, that's a clear description. I would suggest to also link this PR, or more specifically this comment #17526 (comment) in there though. That will give people a concrete output to compare their own to if they run into this issue, and provide the full context. |
|
@casparvl does that look good? |
|
Test report by @Micket |
|
It's still runinng the tensordot test. I tried specifying the full path to this particular test, including the ::test_tensordot part, but it still ran and failed it. |
| 'tests/cupy_tests/fft_tests/test_fft.py', | ||
| # float16 has too low precision for these tests as they are written | ||
| # See https://github.com/easybuilders/easybuild-easyconfigs/pull/17526#issuecomment-1470843170 for details. | ||
| 'tests/cupy_tests/linalg_tests/test_product.py::TestProduct', |
There was a problem hiding this comment.
So, turns out this syntax isn't a thing in pytest. This line does nothing. Looks like this syntax pytest-dev/pytest#3198 but this was never supported, the issue was closed in favor of a regex of what to include.
Needs to be a different flag -k "not test_tensordot"
Ugh. This sucks. It won't even let us specify the full name, like "TestProduct::test_tensordot", just the last part.
There was a problem hiding this comment.
So options are
- let this fail on A40, whoever installs it just has to ignore test failures.
- patch this test out
- add the -k flag, which just really sucks if we ever need to add more than one.
Regardless, this broken ::TestProduct ignore line should be removed, as it does nothing
|
Test report by @Micket |
|
@Micket that last commit seem to fix the A40 problem, at least my build on A40's passed cleanly. |
|
Why leave in |
|
It is supported when when using --deselect, according to the actually merged PR for that part |
|
Test report by @Micket |
|
Going in, thanks @akesandgren! |
(created using
eb --new-pr)Supercedes #17136
The problems with the scipy tests need more investigation.
Depends on easybuilders/easybuild-easyblocks#2872