Stabilize calculation of uncertainties for vertices extremely close to beam line by kehw · Pull Request #279 · star-bnl/star-sw

kehw · 2022-01-12T20:44:38Z

By running the following chain with DEV library, bfc always failed in 64Bit optimized version. 32Bit optimized version has no problem. I did not test the DEBUG version.
Yuri provided me with this fix and in my tests, it works.

root4star -b -q 'bfc.C(100,"pp2022,StiCA,BEmcChkStat,epdhit,-hitfilt,-evout,-fcsCluster,-fcsPoint ","/star/data03/daq/2022/010/23010027/st_physics_23010027_raw_1000015.daq")'

The error message is:

StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=25 accepted, nAnyMatch=3 nAnyVeto=31
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=26 accepted, nAnyMatch=2 nAnyVeto=12
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=28 accepted, nAnyMatch=4 nAnyVeto=13
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=29 accepted, nAnyMatch=3 nAnyVeto=11
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=30 accepted, nAnyMatch=4 nAnyVeto=21
root4star: .sl73_x8664_gcc485/OBJ/StRoot/StarRoot/THelixTrack.cxx:802: double THelixTrack::Step(const double*, double*, double*) const: Assertion `iter' failed.
*** Problem in THElixTrack::Step(vtx) ***
double vtx[3]={nan,nan,nan};
 THelixTrack::this = 0x7ffdbbd964f0
 THelixTrack::fX[3] = { -0.478721 , -0.794980 ,110.145058 }
 THelixTrack::fP[3] = { -0.814047 , 0.490203 ,-0.311495 }
 THelixTrack::fRho  =   0.003078

…x comes from Yuri Fisyak.

plexoos · 2022-01-12T21:56:35Z

Was there any attempt to understand this? It would be nice to hear why zero is a better value than any value < 1e-7 for the distance between the beam line and a point.

kehw · 2022-01-12T22:14:59Z

To my understanding, the problem is not zero .vs. < 1e-7. It is the cov. matrix calculation.
When dist_mag is zero, there must be a divide by zero error happened in the cov. matrix calculation, which eventually leads to

double vtx[3]={nan,nan,nan};

This fix skips the cov. matrix calculation when a very small dist_mag is found.

The reason for it only happened in 64bit build could be that with 32bit, zero was not zero, but some finite number, which let the math go through without generating NAN. However, I do not know how the nan passed to where the assertion failed.

It appears that the error transformation may break when we get closer to the nominal beam position. In other words, all the Jacobian components can become 0 if not enough precision is used to calculate them. This makes the further calculations pointless by yielding a NaN (0/0). We avoid this by doing a reqularization on an irreducible beam width on the order of a nanometer.

kehw · 2022-01-14T03:49:49Z

Delay the calculation of denom_sqrt is nice.

plexoos · 2022-01-14T17:00:44Z

I was able to reproduce the crash in the container we routinely use for CI tests. Our container environment is set to build 64 bit libraries with Gcc optimization (-O2) enabled. On a farm node I can also see the problem with 64-bit -O2 libraries but in a different event. Although haven't checked, I assume that 32bit -O2 libs don't see the problem because that's what we routinely use in FastOffline and nightly tests.

In case of 64b we seem to hit the limit of numerical calculations, specifically, for cases when the vertex gets very close to the nominal beam line (within DBL_EPSILON by my estimate) the transformation of errors cannot be performed as coded, i.e. not enough precision to calculate the difference between two numbers results in exact zero.

If I understand correctly, with x86 instruction set the calculation are carried out in FPU with extended 80bit precision and that maybe the reason why 32bit jobs don't suffer from this limitation.

The proposed solution of limiting the minimal distance looks fine to me. The threshold of 1e-7 cm should have enough margin not to cause any issues when used in calculation involving beam line parameters.

Why do we get more vertices so close to the beam line? In PPV vertex finder used for proton-proton data reconstructions, a vertex can be formed with a single track if its P_T is high enough. If the uncertainty on the track parameters is much worse than the uncertainty on the beam line, the fit result will be dominated by the latter.

kehw · 2022-01-14T18:31:34Z

If I understand correctly, with x86 instruction set the calculation are carried out in FPU with extended 80bit precision and that maybe the reason why 32bit jobs don't suffer from this limitation.

I tried to compile the following code snippet with gcc 4.8.5 with 32 and 64 mode and got different assembly.
Indeed, FPU is used in 32 bit mode and SSE is used in 64 bit mode.

double square(double num) {
    return num * num;
}

; gcc -m32 -O2
square(double):
        fld     QWORD PTR [esp+4]
        fmul    st, st(0)
        ret

; gcc -m64 -O2
square(double):
        mulsd   xmm0, xmm0
        ret

GCC manual
noted the different choices in these two modes.

-mfpmath=unit
Generate floating-point arithmetic for selected unit unit. The choices for unit are:

‘387’
...
This is the default choice for non-Darwin x86-32 targets.

‘sse’
... For the x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.

veprbl

Great to have more comments in the code.

Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>

klendathu2k

Generally looks good. Agree with the discussion that different behavior between 32bit and 64bit is due to the use of 80bit precision in intermediate calculations due to FPU usage. There are possibly some tricks to improve floating point stability in calculating denom_sqrt (such as expanding the expression and regrouping terms to avoid BIG# minus small# calculations...) if thee are further problems.

klendathu2k · 2022-02-01T15:57:33Z

Hmm. This PR has been approved and all checks have passed... but was not automagically merged. Squashing / merging now.

fix an assertion failure in 64bit Run22 pp510 data production. The fi…

66bcc86

…x comes from Yuri Fisyak.

kehw added the bug Something isn't working label Jan 12, 2022

kehw requested review from genevb and plexoos January 12, 2022 20:44

kehw requested review from akioogawa, klendathu2k and perevbnlgov as code owners January 12, 2022 20:44

plexoos approved these changes Jan 14, 2022

View reviewed changes

veprbl reviewed Jan 15, 2022

View reviewed changes

Comment thread StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx Outdated

Comment thread StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx Outdated

veprbl reviewed Jan 17, 2022

View reviewed changes

Comment thread StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx Outdated

Update StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx

d7e6d69

Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>

klendathu2k approved these changes Jan 19, 2022

View reviewed changes

plexoos changed the title ~~fix assertion failure in DEV 64bit Run22 pp510 data production~~ Stabilize calculation of uncertainties for vertices extremely close to beam line Jan 20, 2022

klendathu2k merged commit 4641c7d into star-bnl:main Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize calculation of uncertainties for vertices extremely close to beam line#279

Stabilize calculation of uncertainties for vertices extremely close to beam line#279
klendathu2k merged 3 commits intostar-bnl:mainfrom
kehw:FixGenericVertexMaker

kehw commented Jan 12, 2022

Uh oh!

plexoos commented Jan 12, 2022

Uh oh!

kehw commented Jan 12, 2022 •

edited

Loading

Uh oh!

kehw commented Jan 14, 2022

Uh oh!

plexoos commented Jan 14, 2022

Uh oh!

kehw commented Jan 14, 2022

Uh oh!

veprbl left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

klendathu2k left a comment

Uh oh!

klendathu2k commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kehw commented Jan 12, 2022

Uh oh!

plexoos commented Jan 12, 2022

Uh oh!

kehw commented Jan 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kehw commented Jan 14, 2022

Uh oh!

plexoos commented Jan 14, 2022

Uh oh!

kehw commented Jan 14, 2022

Uh oh!

veprbl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

klendathu2k left a comment

Choose a reason for hiding this comment

Uh oh!

klendathu2k commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kehw commented Jan 12, 2022 •

edited

Loading