Skip to content

Commit 6b2ec5f

Browse files
committed
Vert friction: Force FMAs in tridiag solvers
Switching the vertical friction loops from k/j/i to j/i/k replaced the evaluation of `b1` by FMA with a simpler version, causing an answer change when FMAs are enabled. Although less efficient, this patch adds an always-false loop to trick the compiler and force it to always execute `b1` by FMA. Specifically, loops of the following form execute `b1` by FMA. ``` do k=2,nz if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k) c1(k) = dt * CS%a_v(i,J,K) * b1 b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1) ---> b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1)) d1 = b_denom_1 * b1 visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1 enddo ``` Switching to j/i/k ordering allows the Intel compiler to cache `a_[uv](K)` for use in the next iteration of `k` and evaluate `b1` by a single multiplication. If we insert an impossible branch, such as the following: ``` do k=2,nz if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k) c1(k) = dt * CS%a_v(i,J,K) * b1 b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1) b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1)) d1 = b_denom_1 * b1 visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1 ---> if (dt < 0) exit enddo ``` then it blocks the lookahead logic of the compiler and forces the FMA execution as in the k/j/i version. There is a moderate impact on performance. ``` Before: hits tmin tmax tavg tstd tfrac grain pemin pemax (Ocean vertical viscosity) 300 2.717543 3.805039 3.523935 0.174203 0.064 31 0 511 ``` ``` After: hits tmin tmax tavg tstd tfrac grain pemin pemax (Ocean vertical viscosity) 300 2.780148 3.999669 3.761651 0.210061 0.069 31 0 511 ``` so this should only be considered a temporary fix until FMA answer changes are permitted.
1 parent 757b2d8 commit 6b2ec5f

1 file changed

Lines changed: 12 additions & 0 deletions

File tree

src/parameterizations/vertical/MOM_vert_friction.F90

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -753,6 +753,9 @@ subroutine vertvisc(u, v, h, forces, visc, dt, OBC, ADp, CDp, G, GV, US, CS, &
753753
ADp%du_dt_str(I,j,k) = (CS%h_u(I,j,k) * ADp%du_dt_str(I,j,k) &
754754
+ dt * CS%a_u(I,j,K) * ADp%du_dt_str(I,j,k-1)) * b1
755755
endif
756+
757+
!### Force FMA evaluation of b1 by blocking lookahead with an impossible branch.
758+
if (dt < 0) exit
756759
enddo
757760

758761
if (associated(ADp%du_dt_str)) then
@@ -944,6 +947,9 @@ subroutine vertvisc(u, v, h, forces, visc, dt, OBC, ADp, CDp, G, GV, US, CS, &
944947
ADp%dv_dt_str(i,J,k) = (CS%h_v(i,J,k) * ADp%dv_dt_str(i,J,k) &
945948
+ dt * CS%a_v(i,J,K) * ADp%dv_dt_str(i,J,k-1)) * b1
946949
endif
950+
951+
!### Force FMA evaluation of b1 by blocking lookahead with an impossible branch.
952+
if (dt < 0) exit
947953
enddo
948954

949955
if (associated(ADp%dv_dt_str)) then
@@ -1206,6 +1212,9 @@ subroutine vertvisc_remnant(visc, visc_rem_u, visc_rem_v, dt, G, GV, US, CS)
12061212
b1 = 1.0 / (b_denom_1 + dt * CS%a_u(I,j,K+1))
12071213
d1 = b_denom_1 * b1
12081214
visc_rem_u(I,j,k) = (CS%h_u(I,j,k) + dt * CS%a_u(I,j,K) * visc_rem_u(I,j,k-1)) * b1
1215+
1216+
!### Force FMA evaluation of b1 by blocking lookahead with an impossible branch.
1217+
if (dt < 0) exit
12091218
enddo
12101219

12111220
do k=nz-1,1,-1
@@ -1232,6 +1241,9 @@ subroutine vertvisc_remnant(visc, visc_rem_u, visc_rem_v, dt, G, GV, US, CS)
12321241
b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
12331242
d1 = b_denom_1 * b1
12341243
visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1
1244+
1245+
!### Force FMA evaluation of b1 by blocking lookahead with an impossible branch.
1246+
if (dt < 0) exit
12351247
enddo
12361248

12371249
do k=nz-1,1,-1

0 commit comments

Comments
 (0)