Skip to content

Stabilize calculation of uncertainties for vertices extremely close to beam line#279

Merged
klendathu2k merged 3 commits intostar-bnl:mainfrom
kehw:FixGenericVertexMaker
Feb 1, 2022
Merged

Stabilize calculation of uncertainties for vertices extremely close to beam line#279
klendathu2k merged 3 commits intostar-bnl:mainfrom
kehw:FixGenericVertexMaker

Conversation

@kehw
Copy link
Copy Markdown
Member

@kehw kehw commented Jan 12, 2022

By running the following chain with DEV library, bfc always failed in 64Bit optimized version. 32Bit optimized version has no problem. I did not test the DEBUG version.
Yuri provided me with this fix and in my tests, it works.

root4star -b -q 'bfc.C(100,"pp2022,StiCA,BEmcChkStat,epdhit,-hitfilt,-evout,-fcsCluster,-fcsPoint ","/star/data03/daq/2022/010/23010027/st_physics_23010027_raw_1000015.daq")'

The error message is:

StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=25 accepted, nAnyMatch=3 nAnyVeto=31
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=26 accepted, nAnyMatch=2 nAnyVeto=12
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=28 accepted, nAnyMatch=4 nAnyVeto=13
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=29 accepted, nAnyMatch=3 nAnyVeto=11
StiMaker:INFO  - StPPVertexFinder::evalVertex Vid=30 accepted, nAnyMatch=4 nAnyVeto=21
root4star: .sl73_x8664_gcc485/OBJ/StRoot/StarRoot/THelixTrack.cxx:802: double THelixTrack::Step(const double*, double*, double*) const: Assertion `iter' failed.
*** Problem in THElixTrack::Step(vtx) ***
double vtx[3]={nan,nan,nan};
 THelixTrack::this = 0x7ffdbbd964f0
 THelixTrack::fX[3] = { -0.478721 , -0.794980 ,110.145058 }
 THelixTrack::fP[3] = { -0.814047 , 0.490203 ,-0.311495 }
 THelixTrack::fRho  =   0.003078

@kehw kehw added the bug Something isn't working label Jan 12, 2022
@kehw kehw requested review from genevb and plexoos January 12, 2022 20:44
@plexoos
Copy link
Copy Markdown
Member

plexoos commented Jan 12, 2022

Was there any attempt to understand this? It would be nice to hear why zero is a better value than any value < 1e-7 for the distance between the beam line and a point.

@kehw
Copy link
Copy Markdown
Member Author

kehw commented Jan 12, 2022

To my understanding, the problem is not zero .vs. < 1e-7. It is the cov. matrix calculation.
When dist_mag is zero, there must be a divide by zero error happened in the cov. matrix calculation, which eventually leads to

double vtx[3]={nan,nan,nan};

This fix skips the cov. matrix calculation when a very small dist_mag is found.

The reason for it only happened in 64bit build could be that with 32bit, zero was not zero, but some finite number, which let the math go through without generating NAN. However, I do not know how the nan passed to where the assertion failed.

It appears that the error transformation may break when we get closer to
the nominal beam position. In other words, all the Jacobian components
can become 0 if not enough precision is used to calculate them. This
makes the further calculations pointless by yielding a NaN (0/0). We
avoid this by doing a reqularization on an irreducible beam width on the
order of a nanometer.
@kehw
Copy link
Copy Markdown
Member Author

kehw commented Jan 14, 2022

Delay the calculation of denom_sqrt is nice.

@plexoos
Copy link
Copy Markdown
Member

plexoos commented Jan 14, 2022

I was able to reproduce the crash in the container we routinely use for CI tests. Our container environment is set to build 64 bit libraries with Gcc optimization (-O2) enabled. On a farm node I can also see the problem with 64-bit -O2 libraries but in a different event. Although haven't checked, I assume that 32bit -O2 libs don't see the problem because that's what we routinely use in FastOffline and nightly tests.

In case of 64b we seem to hit the limit of numerical calculations, specifically, for cases when the vertex gets very close to the nominal beam line (within DBL_EPSILON by my estimate) the transformation of errors cannot be performed as coded, i.e. not enough precision to calculate the difference between two numbers results in exact zero.

If I understand correctly, with x86 instruction set the calculation are carried out in FPU with extended 80bit precision and that maybe the reason why 32bit jobs don't suffer from this limitation.

The proposed solution of limiting the minimal distance looks fine to me. The threshold of 1e-7 cm should have enough margin not to cause any issues when used in calculation involving beam line parameters.

Why do we get more vertices so close to the beam line? In PPV vertex finder used for proton-proton data reconstructions, a vertex can be formed with a single track if its P_T is high enough. If the uncertainty on the track parameters is much worse than the uncertainty on the beam line, the fit result will be dominated by the latter.

@kehw
Copy link
Copy Markdown
Member Author

kehw commented Jan 14, 2022

If I understand correctly, with x86 instruction set the calculation are carried out in FPU with extended 80bit precision and that maybe the reason why 32bit jobs don't suffer from this limitation.

I tried to compile the following code snippet with gcc 4.8.5 with 32 and 64 mode and got different assembly.
Indeed, FPU is used in 32 bit mode and SSE is used in 64 bit mode.

double square(double num) {
    return num * num;
}
; gcc -m32 -O2
square(double):
        fld     QWORD PTR [esp+4]
        fmul    st, st(0)
        ret
; gcc -m64 -O2
square(double):
        mulsd   xmm0, xmm0
        ret

GCC manual
noted the different choices in these two modes.

-mfpmath=unit
Generate floating-point arithmetic for selected unit unit. The choices for unit are:

‘387’
...
This is the default choice for non-Darwin x86-32 targets.

‘sse’
... For the x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80 bits.

Copy link
Copy Markdown
Member

@veprbl veprbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have more comments in the code.

Comment thread StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx Outdated
Comment thread StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx Outdated
Comment thread StRoot/StGenericVertexMaker/StGenericVertexFinder.cxx Outdated
Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>
Copy link
Copy Markdown
Contributor

@klendathu2k klendathu2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Agree with the discussion that different behavior between 32bit and 64bit is due to the use of 80bit precision in intermediate calculations due to FPU usage. There are possibly some tricks to improve floating point stability in calculating denom_sqrt (such as expanding the expression and regrouping terms to avoid BIG# minus small# calculations...) if thee are further problems.

@plexoos plexoos changed the title fix assertion failure in DEV 64bit Run22 pp510 data production Stabilize calculation of uncertainties for vertices extremely close to beam line Jan 20, 2022
@klendathu2k
Copy link
Copy Markdown
Contributor

Hmm. This PR has been approved and all checks have passed... but was not automagically merged. Squashing / merging now.

@klendathu2k klendathu2k merged commit 4641c7d into star-bnl:main Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants