Stabilize calculation of uncertainties for vertices extremely close to beam line#279
Conversation
…x comes from Yuri Fisyak.
|
Was there any attempt to understand this? It would be nice to hear why zero is a better value than any value < 1e-7 for the distance between the beam line and a point. |
|
To my understanding, the problem is not zero .vs. < 1e-7. It is the cov. matrix calculation.
This fix skips the cov. matrix calculation when a very small dist_mag is found. The reason for it only happened in 64bit build could be that with 32bit, zero was not zero, but some finite number, which let the math go through without generating NAN. However, I do not know how the nan passed to where the assertion failed. |
It appears that the error transformation may break when we get closer to the nominal beam position. In other words, all the Jacobian components can become 0 if not enough precision is used to calculate them. This makes the further calculations pointless by yielding a NaN (0/0). We avoid this by doing a reqularization on an irreducible beam width on the order of a nanometer.
|
Delay the calculation of denom_sqrt is nice. |
|
I was able to reproduce the crash in the container we routinely use for CI tests. Our container environment is set to build 64 bit libraries with Gcc optimization (-O2) enabled. On a farm node I can also see the problem with 64-bit -O2 libraries but in a different event. Although haven't checked, I assume that 32bit -O2 libs don't see the problem because that's what we routinely use in FastOffline and nightly tests. In case of 64b we seem to hit the limit of numerical calculations, specifically, for cases when the vertex gets very close to the nominal beam line (within DBL_EPSILON by my estimate) the transformation of errors cannot be performed as coded, i.e. not enough precision to calculate the difference between two numbers results in exact zero. If I understand correctly, with x86 instruction set the calculation are carried out in FPU with extended 80bit precision and that maybe the reason why 32bit jobs don't suffer from this limitation. The proposed solution of limiting the minimal distance looks fine to me. The threshold of 1e-7 cm should have enough margin not to cause any issues when used in calculation involving beam line parameters. Why do we get more vertices so close to the beam line? In PPV vertex finder used for proton-proton data reconstructions, a vertex can be formed with a single track if its P_T is high enough. If the uncertainty on the track parameters is much worse than the uncertainty on the beam line, the fit result will be dominated by the latter. |
I tried to compile the following code snippet with gcc 4.8.5 with 32 and 64 mode and got different assembly. double square(double num) {
return num * num;
}; gcc -m32 -O2
square(double):
fld QWORD PTR [esp+4]
fmul st, st(0)
ret; gcc -m64 -O2
square(double):
mulsd xmm0, xmm0
retGCC manual |
veprbl
left a comment
There was a problem hiding this comment.
Great to have more comments in the code.
Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>
klendathu2k
left a comment
There was a problem hiding this comment.
Generally looks good. Agree with the discussion that different behavior between 32bit and 64bit is due to the use of 80bit precision in intermediate calculations due to FPU usage. There are possibly some tricks to improve floating point stability in calculating denom_sqrt (such as expanding the expression and regrouping terms to avoid BIG# minus small# calculations...) if thee are further problems.
|
Hmm. This PR has been approved and all checks have passed... but was not automagically merged. Squashing / merging now. |
By running the following chain with DEV library, bfc always failed in 64Bit optimized version. 32Bit optimized version has no problem. I did not test the DEBUG version.
Yuri provided me with this fix and in my tests, it works.
The error message is: