fix: correct IFMA vector mul carry propagation#816
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a correctness bug in the AVX-512 IFMA vector multiplication carry propagation for 4-word fields. The parallel carry extraction was incorrect because each carry can affect the next limb's value before its own carry is computed. The fix makes the carry chain sequential, at a ~6% performance cost.
Changes:
- Sequential carry propagation in the IFMA Montgomery multiplication generator and regenerated assembly
- New
TestVectorAliasingtest (template + generated) covering aliased vector operations - Non-generated regression test for the specific
bls12-377/frfailure case
Reviewed changes
Copilot reviewed 39 out of 41 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
internal/generator/field/asm/amd64/element_vec_4words_ifma.go |
Core fix: sequential carry propagation instead of parallel |
field/asm/element_4w/element_4w_amd64.s |
Regenerated assembly with sequential CARRY_PROP macro calls |
internal/generator/field/template/element/testvector.go.tmpl |
Added deterministicVector helper and TestVectorAliasing test template |
ecc/bls12-377/fr/vector_ifma_regression_test.go |
Non-generated regression test for the specific IFMA carry bug |
ecc/*/fr/vector_test.go, ecc/*/fp/vector_test.go, field/*/vector_test.go |
Generated aliasing tests for all affected field packages |
ecc/*/element_amd64.s |
Updated hashes to force recompilation against new shared assembly |
internal/generator/field/template/element/vectoropsamd64*.go.tmpl |
Minor trailing newline fixes |
You can also share your feedback on Copilot code review. Take the survey.
gusiri
added a commit
to Consensys/linea-monorepo
that referenced
this pull request
Mar 12, 2026
Update gnark-crypto to commit 048069ff09e521dbfc806ae80f7f08e4348ee9d9 which fixes the mulVec IFMA assembly bug on AVX-512 IFMA capable CPUs. reference: Consensys/gnark-crypto#816 This resolves keccak/BLS12-377 unit test failures (TestHeartBeat, TestLookup, TestSelfRecursionOpsSisSingleLayered, TestSelfRecursionOpsSisMultiLayered) that occurred on machines with avx512ifma support.
ivokub
previously approved these changes
Mar 12, 2026
Collaborator
ivokub
left a comment
There was a problem hiding this comment.
The tests look good, the assembly part I don't understand that well.
ivokub
previously approved these changes
Mar 12, 2026
ivokub
approved these changes
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This fixes a correctness bug in the AVX-512 IFMA vector multiplication kernel used by 4-word fields.
The failure was reproduced in
linea-monorepo/proveronbls12-377/frduring:The root cause was in the generated IFMA Montgomery path: the fused
x16normalization extracted carries from all shifted radix-52 limbs in parallel, but that carry chain is sequential. Each carry changes the next limb before that limb's carry is known. For some operand pairs, that produced an incorrect result even with a distinct output buffer.This PR:
x16normalization sequentialbls12-377/frregression test for the concrete IFMA failure shapeTesting
Ran:
Benchmark
Compared this branch against
origin/masterwith a small external benchmark harness targetinggithub.com/consensys/gnark-crypto/ecc/bls12-377/fr.Vector.Mulon sizes 512, 4096, and 65536, usingbenchstatover 10 samples per case.Results:
No allocation change was observed.
Note
High Risk
Touches performance-critical, security-adjacent finite-field arithmetic: the AVX-512 IFMA Montgomery vector-mul carry chain is changed and the generated assembly is regenerated, so any mistake would cause silent incorrect crypto computations on IFMA-capable CPUs.
Overview
Fixes a correctness bug in the AVX-512 IFMA Montgomery vector multiplication path for 4-word fields. The IFMA generator and generated
element_4w_amd64.snow propagate the fused x16 normalization carry chain sequentially (instead of extracting carries in parallel), and the IFMA transpose path is simplified to keep a consistent lane order without extraVPERMQpermutations.Strengthens test coverage for vector ops. Adds deterministic aliasing tests for
Add/Sub/ScalarMul/Mulacross many curves/fields, plus IFMA-only edge-case and fuzz tests (and a targetedbls12-377/frregression) to catch carry-chain and in-place/destination-aliasing failures. Regenerates affected*_amd64.sinclude hashes and templates accordingly.Written by Cursor Bugbot for commit 0be4e2a. This will update automatically on new commits. Configure here.