Commit 6b2ec5f
committed
Vert friction: Force FMAs in tridiag solvers
Switching the vertical friction loops from k/j/i to j/i/k replaced the
evaluation of `b1` by FMA with a simpler version, causing an answer
change when FMAs are enabled.
Although less efficient, this patch adds an always-false loop to trick
the compiler and force it to always execute `b1` by FMA.
Specifically, loops of the following form execute `b1` by FMA.
```
do k=2,nz
if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k)
c1(k) = dt * CS%a_v(i,J,K) * b1
b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1)
---> b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
d1 = b_denom_1 * b1
visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1
enddo
```
Switching to j/i/k ordering allows the Intel compiler to cache `a_[uv](K)` for
use in the next iteration of `k` and evaluate `b1` by a single multiplication.
If we insert an impossible branch, such as the following:
```
do k=2,nz
if (allocated(visc%Ray_v)) Ray = visc%Ray_v(i,J,k)
c1(k) = dt * CS%a_v(i,J,K) * b1
b_denom_1 = CS%h_v(i,J,k) + dt * (Ray + CS%a_v(i,J,K) * d1)
b1 = 1.0 / (b_denom_1 + dt * CS%a_v(i,J,K+1))
d1 = b_denom_1 * b1
visc_rem_v(i,J,k) = (CS%h_v(i,J,k) + dt * CS%a_v(i,J,K) * visc_rem_v(i,J,k-1)) * b1
---> if (dt < 0) exit
enddo
```
then it blocks the lookahead logic of the compiler and forces the FMA execution
as in the k/j/i version.
There is a moderate impact on performance.
```
Before:
hits tmin tmax tavg tstd tfrac grain pemin pemax
(Ocean vertical viscosity) 300 2.717543 3.805039 3.523935 0.174203 0.064 31 0 511
```
```
After:
hits tmin tmax tavg tstd tfrac grain pemin pemax
(Ocean vertical viscosity) 300 2.780148 3.999669 3.761651 0.210061 0.069 31 0 511
```
so this should only be considered a temporary fix until FMA answer changes are
permitted.1 parent 757b2d8 commit 6b2ec5f
1 file changed
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
753 | 753 | | |
754 | 754 | | |
755 | 755 | | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
756 | 759 | | |
757 | 760 | | |
758 | 761 | | |
| |||
944 | 947 | | |
945 | 948 | | |
946 | 949 | | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
947 | 953 | | |
948 | 954 | | |
949 | 955 | | |
| |||
1206 | 1212 | | |
1207 | 1213 | | |
1208 | 1214 | | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
1209 | 1218 | | |
1210 | 1219 | | |
1211 | 1220 | | |
| |||
1232 | 1241 | | |
1233 | 1242 | | |
1234 | 1243 | | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
1235 | 1247 | | |
1236 | 1248 | | |
1237 | 1249 | | |
| |||
0 commit comments