homme HIP changes#5039
Conversation
Conflicts: components/homme/src/share/compose/compose_cedr.cpp components/homme/src/share/cxx/HybridVCoord.cpp components/homme/src/theta-l_kokkos/cxx/HyperviscosityFunctorImpl.cpp components/homme/src/theta-l_kokkos/prim_driver_mod.F90
|
|
||
| INCLUDE(CheckCXXCompilerFlag) | ||
| CHECK_CXX_COMPILER_FLAG("-std=c++14" CXX14_SUPPORTED) | ||
| CHECK_CXX_COMPILER_FLAG("-std=c++17" CXX14_SUPPORTED) |
There was a problem hiding this comment.
here is a typo? clean this up
There was a problem hiding this comment.
Sure, assuming all the tests pass on all the needed machines. My guess is they will.
There was a problem hiding this comment.
Does the C++17 requirement come from Crusher or HIP? As far as I can tell, the current Kokkos version is still OK with C++14 for other platforms.
There was a problem hiding this comment.
I think Kokkos is "close" to switch to C++17 required. IIRC, the next release (3.7, due in Sept) will require it. If it's a deal breaker for HIP, we might think to anticipate kokkos, and switch to C++17 already. I don't think there's a compelling reason to stick with C++14 for the current production clusters, no?
There was a problem hiding this comment.
i think i needed c++17 for hipcc.
| SET(CMAKE_C_COMPILER "mpicc" CACHE STRING "") | ||
| SET(CMAKE_Fortran_COMPILER "mpifort" CACHE STRING "") | ||
| SET(CMAKE_CXX_COMPILER "/ccs/home/onguba/kokkos/bin/nvcc_wrapper" CACHE STRING "") | ||
| SET(CMAKE_CXX_COMPILER "/ccs/home/onguba/acme-MASTER-GB/externals/kokkos/bin/nvcc_wrapper" CACHE STRING "") |
|
|
||
| SET(USE_QUEUING FALSE CACHE BOOL "") | ||
|
|
||
| SET(ENABLE_CUDA FALSE CACHE BOOL "") |
There was a problem hiding this comment.
I do not see variable ENABLE_CUDA used anywhere, only KOKKOS_ENABLE_CUDA is used. I will remove it from all machine files.
|
|
||
| SET (HOMMEXX_EXEC_SPACE "Default" CACHE STRING "Select the kokkos exec space") | ||
|
|
||
| #is HOMMEXX_EXEC_SPACE set anywhere? in scream? |
There was a problem hiding this comment.
Please remove dev comments to yourself.
|
|
||
| # needed for CPRNC build system | ||
| set (cprnc_dummy_file "${CMAKE_CURRENT_SOURCE_DIR}/utils/cime/tools/cprnc/Macros.cmake") | ||
| set (cprnc_dummy_file "${CMAKE_CURRENT_SOURCE_DIR}/utils/cime/CIME/non_py/cprnc/Macros.cmake") |
There was a problem hiding this comment.
Note to other reviewers: These CIME/non_py lines follow a commit already in master.
|
|
||
| IF (HOMME_USE_KOKKOS) | ||
| TARGET_LINK_LIBRARIES(${execName} kokkos) | ||
| link_to_kokkos(${execName}) |
There was a problem hiding this comment.
Note to other reviewers: I reorganized things a bit so that link_to_kokkos takes care of the library linking.
| SET (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14") | ||
| CHECK_CXX_COMPILER_FLAG("-std=c++17" CXX17_SUPPORTED) | ||
| IF (${CXX17_SUPPORTED}) | ||
| SET (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17") |
There was a problem hiding this comment.
Are we sure C++17 is actually needed? I had the impression that it was thought Crusher requires it but then it turns out it does not.
There was a problem hiding this comment.
Now I do not remember why it is here. Do you prefer 14?
There was a problem hiding this comment.
For cleanliness we should not mandate more than is technically required. I believe for a while weaver could not handle C++17, but that may have changed now.
| SET(CMAKE_Fortran_COMPILER "ftn" CACHE STRING "") | ||
| SET(CMAKE_CXX_COMPILER "hipcc" CACHE STRING "") | ||
| #SET(CMAKE_CXX_COMPILER "/ccs/home/onguba/kokkos/bin/nvcc_wrapper" CACHE STRING "") | ||
| SET(E3SM_KOKKOS_PATH "/ccs/home/onguba/kokkos-example-spock-hipcc2/bld-hipcc" CACHE STRING "") |
There was a problem hiding this comment.
I had removed dependence on onguba-based things in the crusher file. spock is I think irrelevant at this point, so I don't mind this line. But we could also consider not including the spock file in this PR. Up to you, though.
| #if defined(HOMMEXX_CUDA_SPACE) || \ | ||
| (defined(HOMMEXX_DEFAULT_SPACE) && defined(KOKKOS_ENABLE_CUDA)) // Can't use GPTL timers on CUDA | ||
| //OG not sure about timers and HIP, probably the same as with CUDA | ||
| //#if defined(HOMMEXX_HIP_SPACE) || defined(HOMMEXX_CUDA_SPACE) || \ |
There was a problem hiding this comment.
Please remove the commented-out dev code.
bartgol
left a comment
There was a problem hiding this comment.
Looks good. I have a few suggestions, that are mostly style related. Feel free to disregard any or all of them.
|
|
||
| SET (HOMMEXX_EXEC_SPACE "Default" CACHE STRING "Select the kokkos exec space") | ||
|
|
||
| #is HOMMEXX_EXEC_SPACE set anywhere? in scream? |
There was a problem hiding this comment.
Doesn't have to be set. The "Default" option usually works. The only case where it is useful to set it manually is, e.g., if Cuda or OpenMP is enabled in kokkos, but one wants to use Serial instead.
| IF (${CXX14_SUPPORTED}) | ||
| SET (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14") | ||
| CHECK_CXX_COMPILER_FLAG("-std=c++17" CXX17_SUPPORTED) | ||
| IF (${CXX17_SUPPORTED}) |
There was a problem hiding this comment.
FYI, some cmake might issue a warning here, depending on policies settings. Unless old policy settings are used, you should get a warning like
if given arguments: "TRUE" An argument named "TRUE" appears in a conditional statement.
The clean cmake syntax is to use if (VAR) rather than if (${VAR}).
| @@ -932,7 +932,7 @@ struct CaarFunctorImpl { | |||
| // CUDA version | |||
There was a problem hiding this comment.
This should prob be changed to GPU version...
| @@ -986,7 +986,7 @@ struct CaarFunctorImpl { | |||
| // Non-CUDA version | |||
There was a problem hiding this comment.
...and this to Non-GPU version.
| * See the file 'COPYRIGHT' in the HOMMEXX/src/share/cxx directory | ||
| *******************************************************************************/ | ||
|
|
||
| #include "Config.hpp" |
There was a problem hiding this comment.
For all ComposeTransport*.cpp files, perhaps it would be easier to add the source files to THETAL_DEPS_CXX in the CMakeLists.txt only if HOMME_ENABLE_COMPOSE is ON, and avoid CPP logic in the source/header files.
| static | ||
| KOKKOS_INLINE_FUNCTION | ||
| typename std::enable_if<!std::is_same<ExecSpaceType,Hommexx_Cuda>::value && | ||
| typename std::enable_if<!std::is_same<ExecSpaceType,HommexxGPU>::value && |
There was a problem hiding this comment.
Is this basically checking that ExecSpaceType is Serial? Should we perhaps just do
typename std::enable_if<!std::is_same<ExecSpaceType,Kokkos::Serial>::value, int> ::type
?
There was a problem hiding this comment.
Serial would be too restrictive. This is checking that it's a non-GPU space.
There was a problem hiding this comment.
But after this there is "and not OpenMP"...
There was a problem hiding this comment.
Good point. But technically, it's trying to catch the cases that are excluded by the two specializations below it, so IMO it's more accurate the way it is. In practice, though, Serial would be OK.
There was a problem hiding this comment.
Either way is fine, I guess. If future archs support (like sycl) makes this chunk hard to read, we can revisit.
| ${SRC_SHARE_DIR}/compose | ||
| ${CMAKE_BINARY_DIR}/src/share/cxx | ||
| ) | ||
| IF (NOT HOMME_ENABLE_COMPOSE) |
There was a problem hiding this comment.
I find this backward logic a bit hard to parse. Perhaps we could do the other way around: not add compose dir in the SET statement above, and replace this IF with
if (HOMME_ENABLE_COMPOSE)
add compose
endif()
| ${SRC_SHARE_DIR}/planar_mesh_mod.F90 | ||
| ) | ||
|
|
||
| IF (NOT HOMME_ENABLE_COMPOSE) |
There was a problem hiding this comment.
Same here: it's easier to to the other way around.
| #include "utilities/ViewUtils.hpp" | ||
|
|
||
| #if ! defined(NDEBUG) | ||
| #define RESOLVE_ISSUE_WITH_ASSERTS |
There was a problem hiding this comment.
Would you mind adding a couple of lines of comment explaining what the issue with asserts is? I'm afraid someone 3 years from now will stumble on this while doing some maintenance and have no clue what's going on.
There was a problem hiding this comment.
I will add a comment. I believe it is a common case in scream too? in debug regime with asserts team sizes are too big.
There was a problem hiding this comment.
Ah, yes, that one. It showed up in scream as well, yes.
| SET (COMMON_DEFINITIONS NP=4 NC=4) | ||
| IF (CUDA_BUILD) | ||
| SET(COMMON_DEFINITIONS ${COMMON_DEFINITIONS} CUDA_BUILD) | ||
| IF (CUDA_BUILD OR HIP_BUILD) |
There was a problem hiding this comment.
Could simply do IF (HOMMEXX_ENABLE_GPU), like it's done elsewhere, so it's ready if/when we add SYCL.
There was a problem hiding this comment.
Thanks -- this one was discussed before, there is CUDA_BUILD used elsewhere in homme cmake. It can be cleaned up later.
|
I have a request, but feel free to disregard and punt to a future PR: in scream, we keep getting warnings due to deprecated code in Kokkos related to MDRange. In particular, these lines in with in |
|
This commit: Crusher GPU 2 execs, 72 and 128 levels, 10 tracers, ne4, 1,8,16,24,and 32 ranks. theta-l-nlev72-kokkos crashes in 1-rank run with mpich error after a few time steps (i haven't run 1 rank with it before). the rest is ok. this one is built with extra diagnostics arrays. theta-nlev128-kokkos -- all are ok. |
|
Weaver -- theta-f* ne2 * tests for F vs CXX all pass for this commit and unit tests, too. |
Conflicts: components/homme/test_execs/preqx_kokkos_ut/CMakeLists.txt components/homme/test_execs/thetal_kokkos_ut/CMakeLists.txt
|
I merged yesterday's master and got these fails on chrysalis from e3sm_integration: which i cannot explain. Still waiting for cdash to come up with today's nightlies. From a few tests that I looked at, all normalized diffs seem to be small. homme and hommebfb pass. |
|
@oksanaguba what is the status of this PR? |
|
I had some issues testing it, trying to resolve them now. |
|
Testing this branch with e3sm_developer on chrysalis, only NL diffs are due to layouts, which differ for the baselines ran through jenkins (explanation is in infrastructure-dev channel): All NL diffs are due to layouts. |
Many changes for HIP. Adding a new macro, HOMMEXX_ENABLE_GPU, redoing templates to generalize CUDA/HIP code. Arch-dependent code (very few lines) still uses kokkos macros. Regarding HIP -- standalone homme with output builds and runs, but no eye ball norm tests wrt output were performed. Also, not performed: bfb tests, unit tests (not tried to run), investigating bfb flags. This is for future PRs. Fixes #5069 for team sizes too large with asserts and nans in reprosums for cxx execs with diagnostics macro ENERGY_DIAGNOSTICS. [bfb] for nightlies and v100.
|
in next |
|
Can you take a look and fix these multi-threaded cases for pgi on either Ascent or Summit?
|
|
Thanks -- i see a message from kokkos, not sure what it is , i will try with debug: |
HIP can't handle the reference, but we still need it for the F90 dycore with HORIZ_OPENMP enabled, since in that code path we enter all of our routines within a threaded region and so cannot create new Kokkos::Serial views during time stepping. (Recall Kokkos::Serial is necessary b/c Homme's threading is not compatible with Kokkos::OpenMP.) Thus, if COMPOSE_PORT is not defined, take a reference.
|
@amametjanov thanks for letting us know and pointing us to the failing tests. The issue was in one of my commits for this PR. I have fixed the issue, tested it on the five tests you linked, and pushed the fix to this branch. @oksanaguba you can remerge this PR to next when you have a chance. Thanks. |
Many changes for HIP. Adding a new macro, HOMMEXX_ENABLE_GPU, redoing templates to generalize CUDA/HIP code. Arch-dependent code (very few lines) still uses kokkos macros. Regarding HIP -- standalone homme with output builds and runs, but no eye ball norm tests wrt output were performed. Also, not performed: bfb tests, unit tests (not tried to run), investigating bfb flags. This is for future PRs. Fixes #5069 for team sizes too large with asserts and nans in reprosums for cxx execs with diagnostics macro ENERGY_DIAGNOSTICS. [bfb] for nightlies and v100.
|
in next |
Many changes for HIP.
Adding a new macro, HOMMEXX_ENABLE_GPU, redoing templates to generalize CUDA/HIP code. Arch-dependent code (very few lines) still uses kokkos macros.
Regarding HIP -- standalone homme with output builds and runs, but no eye ball norm tests wrt output were performed. Also, not performed: bfb tests, unit tests (not tried to run), investigating bfb flags. This is for future PRs.
Fixes #5069 for team sizes too large with asserts and nans in reprosums for cxx execs with diagnostics macro ENERGY_DIAGNOSTICS.
[bfb] for nightlies and v100.