Skip to content

perf(bw6): faster direct Fp6 mul#776

Merged
yelhousni merged 3 commits intomasterfrom
perf/bw6-Fp6-direct
Dec 15, 2025
Merged

perf(bw6): faster direct Fp6 mul#776
yelhousni merged 3 commits intomasterfrom
perf/bw6-Fp6-direct

Conversation

@yelhousni
Copy link
Copy Markdown
Collaborator

@yelhousni yelhousni commented Dec 11, 2025

Description

Montgomery in [1] presented a formula for multiplying quintic polynomials that needs one less multiplication than Karatsuba, but not much was said about the number of additions. In [2] the authors adapt this formula to compute the multiplication in direct sextic extensions Fp6. They report in Appendix A.1 a cost of 17M + 143A + 5B, where M, A, B stand for multiplication, addition and multiplication by the sextic non-residue β. In the case of BW6-761 β=4 so B=2A and the overall cost is 17M + 153A. The previous gnark-crypto implementation was using 17M + 222A. In this PR we implement the product in 17M + 130A.

Type of change

  • New feature (non-breaking change which adds functionality)

How has this been tested?

Consistency between direct extension and 2-3 tower multiplications.

How has this been benchmarked?

benchmark                        old ns/op     new ns/op     delta
BenchmarkE6DMulMontgomery6-8     5142          3181          -38.14%

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I did not modify files generated from templates
  • golangci-lint does not output errors locally
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

Note

Replaces E6D.Mul with an optimized Montgomery-6 implementation and introduces a direct Montgomery-6 squaring routine, with corresponding property tests and benchmarks.

  • Fp6 (direct) arithmetic (ecc/bw6-761/internal/fptower/e6_direct.go):
    • Mul: Switch default to mulMontgomery6 and reimplement with a staged, 17M/low-additions Montgomery-6 algorithm.
    • Square: Add squareMontgomery6 (direct Montgomery-6 squaring) plus squareTower helper; Square currently uses tower path.
    • Minor: adjust SetOne comment.
  • Tests (ecc/bw6-761/internal/fptower/e6_direct_test.go):
    • Property: direct square matches tower square.
    • Benchmarks: add BenchmarkE6DSquareTower and BenchmarkE6DSquareMontgomery6; keep mul benchmark using Montgomery-6.

Written by Cursor Bugbot for commit 029b8fe. This will update automatically on new commits. Configure here.

@yelhousni yelhousni requested a review from gbotrel December 11, 2025 00:40
@yelhousni yelhousni self-assigned this Dec 11, 2025
@yelhousni yelhousni added this to the v0.19.N milestone Dec 11, 2025
@yelhousni yelhousni merged commit e3a9f02 into master Dec 15, 2025
14 checks passed
@yelhousni yelhousni deleted the perf/bw6-Fp6-direct branch December 15, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants