Skip to content

fix: correct IFMA vector mul carry propagation#816

Merged
gbotrel merged 5 commits intomasterfrom
fix/aliasingvec
Mar 16, 2026
Merged

fix: correct IFMA vector mul carry propagation#816
gbotrel merged 5 commits intomasterfrom
fix/aliasingvec

Conversation

@gbotrel
Copy link
Copy Markdown
Collaborator

@gbotrel gbotrel commented Mar 12, 2026

Summary

This fixes a correctness bug in the AVX-512 IFMA vector multiplication kernel used by 4-word fields.

The failure was reproduced in linea-monorepo/prover on bls12-377/fr during:

go test -count=1 -run 'TestLookup' -timeout 5m ./circuits/pi-interconnection/keccak/prover/protocol/compiler/recursion/

The root cause was in the generated IFMA Montgomery path: the fused x16 normalization extracted carries from all shifted radix-52 limbs in parallel, but that carry chain is sequential. Each carry changes the next limb before that limb's carry is known. For some operand pairs, that produced an incorrect result even with a distinct output buffer.

This PR:

  • fixes the carry propagation in the IFMA generator by making the fused x16 normalization sequential
  • keeps the generated aliasing coverage in vector tests
  • adds a non-generated bls12-377/fr regression test for the concrete IFMA failure shape
  • regenerates the shared assembly and affected wrappers/tests

Testing

Ran:

go generate ./internal/generator
go test -count=1 ./ecc/bls12-377/fr ./ecc/bls12-381/fr ./ecc/bn254/fr ./field/babybear ./field/koalabear
go test -count=1 -run 'TestLookup' -timeout 5m ./circuits/pi-interconnection/keccak/prover/protocol/compiler/recursion/

Benchmark

Compared this branch against origin/master with a small external benchmark harness targeting github.com/consensys/gnark-crypto/ecc/bls12-377/fr.Vector.Mul on sizes 512, 4096, and 65536, using benchstat over 10 samples per case.

Results:

VectorMul512    +6.33%
VectorMul4096   +5.74%
VectorMul65536  +6.42%
geomean         +6.16%

No allocation change was observed.


Note

High Risk
Touches performance-critical, security-adjacent finite-field arithmetic: the AVX-512 IFMA Montgomery vector-mul carry chain is changed and the generated assembly is regenerated, so any mistake would cause silent incorrect crypto computations on IFMA-capable CPUs.

Overview
Fixes a correctness bug in the AVX-512 IFMA Montgomery vector multiplication path for 4-word fields. The IFMA generator and generated element_4w_amd64.s now propagate the fused x16 normalization carry chain sequentially (instead of extracting carries in parallel), and the IFMA transpose path is simplified to keep a consistent lane order without extra VPERMQ permutations.

Strengthens test coverage for vector ops. Adds deterministic aliasing tests for Add/Sub/ScalarMul/Mul across many curves/fields, plus IFMA-only edge-case and fuzz tests (and a targeted bls12-377/fr regression) to catch carry-chain and in-place/destination-aliasing failures. Regenerates affected *_amd64.s include hashes and templates accordingly.

Written by Cursor Bugbot for commit 0be4e2a. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings March 12, 2026 22:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a correctness bug in the AVX-512 IFMA vector multiplication carry propagation for 4-word fields. The parallel carry extraction was incorrect because each carry can affect the next limb's value before its own carry is computed. The fix makes the carry chain sequential, at a ~6% performance cost.

Changes:

  • Sequential carry propagation in the IFMA Montgomery multiplication generator and regenerated assembly
  • New TestVectorAliasing test (template + generated) covering aliased vector operations
  • Non-generated regression test for the specific bls12-377/fr failure case

Reviewed changes

Copilot reviewed 39 out of 41 changed files in this pull request and generated no comments.

Show a summary per file
File Description
internal/generator/field/asm/amd64/element_vec_4words_ifma.go Core fix: sequential carry propagation instead of parallel
field/asm/element_4w/element_4w_amd64.s Regenerated assembly with sequential CARRY_PROP macro calls
internal/generator/field/template/element/testvector.go.tmpl Added deterministicVector helper and TestVectorAliasing test template
ecc/bls12-377/fr/vector_ifma_regression_test.go Non-generated regression test for the specific IFMA carry bug
ecc/*/fr/vector_test.go, ecc/*/fp/vector_test.go, field/*/vector_test.go Generated aliasing tests for all affected field packages
ecc/*/element_amd64.s Updated hashes to force recompilation against new shared assembly
internal/generator/field/template/element/vectoropsamd64*.go.tmpl Minor trailing newline fixes

You can also share your feedback on Copilot code review. Take the survey.

@gbotrel gbotrel requested review from Tabaie, ivokub and yelhousni March 12, 2026 22:52
gusiri added a commit to Consensys/linea-monorepo that referenced this pull request Mar 12, 2026
Update gnark-crypto to commit 048069ff09e521dbfc806ae80f7f08e4348ee9d9
which fixes the mulVec IFMA assembly bug on AVX-512 IFMA capable CPUs.
reference: Consensys/gnark-crypto#816

This resolves keccak/BLS12-377 unit test failures (TestHeartBeat,
TestLookup, TestSelfRecursionOpsSisSingleLayered,
TestSelfRecursionOpsSisMultiLayered) that occurred on machines with
avx512ifma support.
ivokub
ivokub previously approved these changes Mar 12, 2026
Copy link
Copy Markdown
Collaborator

@ivokub ivokub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good, the assembly part I don't understand that well.

ivokub
ivokub previously approved these changes Mar 12, 2026
@gbotrel gbotrel merged commit be4aeba into master Mar 16, 2026
13 checks passed
@gbotrel gbotrel deleted the fix/aliasingvec branch March 16, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants