When debugging or optimizing the FDTD kernel, ensuring code correctness is of interest. One of the best validation method is to ensure all floating-point numbers are bit-identical to previous versions. However, it's well known that floating-point operations are inexact, the significant digits depend on compiler code generation and FPU settings.
In the openEMS codebase, there are two obstacles that may prevent numerical reproducibility.
-
FPU Denormal Flush-to-Zero: On x86_64 CPUs, very small floating-point values require microcode assists, with a heavy performance penalty. Thus it's standard practice to enable Flush-to-Zero. For troubleshooting, there should be an advanced command-line flag to disable it.
-
FP Contraction: The floating-point operations may be reordered is several ways, one of them is the famous -fun-unsafe-math-optimization and -Ofast by GCC. But even when using only safe math optimization, FMA instructions may still change the numerical precision. This behavior can be disabled using a command-line flag to turn off "FP Contraction". This flag should be added to the default CXXFLAGS in a CMake debugging-mode build.
Implementing both features is not sufficient for numerical reproducibility (the SSE engine's VV, VI, IV, VV operator compression still changes the order of computation, from my personal experience), but are necessary.
Both features exist in my private codebase for over a year, for development and testing of my never-finished engine rewrite project. I'm now posting this as an issue serving both as a developer note and a TODO list for myself to eventually upstream it.
When debugging or optimizing the FDTD kernel, ensuring code correctness is of interest. One of the best validation method is to ensure all floating-point numbers are bit-identical to previous versions. However, it's well known that floating-point operations are inexact, the significant digits depend on compiler code generation and FPU settings.
In the openEMS codebase, there are two obstacles that may prevent numerical reproducibility.
FPU Denormal Flush-to-Zero: On x86_64 CPUs, very small floating-point values require microcode assists, with a heavy performance penalty. Thus it's standard practice to enable Flush-to-Zero. For troubleshooting, there should be an advanced command-line flag to disable it.
FP Contraction: The floating-point operations may be reordered is several ways, one of them is the famous
-fun-unsafe-math-optimizationand-Ofastby GCC. But even when using only safe math optimization, FMA instructions may still change the numerical precision. This behavior can be disabled using a command-line flag to turn off "FP Contraction". This flag should be added to the default CXXFLAGS in a CMake debugging-mode build.Implementing both features is not sufficient for numerical reproducibility (the SSE engine's VV, VI, IV, VV operator compression still changes the order of computation, from my personal experience), but are necessary.
Both features exist in my private codebase for over a year, for development and testing of my never-finished engine rewrite project. I'm now posting this as an issue serving both as a developer note and a TODO list for myself to eventually upstream it.