Skip to content

perf: improve small size fft#791

Merged
gbotrel merged 1 commit intomasterfrom
perf/fft
Jan 9, 2026
Merged

perf: improve small size fft#791
gbotrel merged 1 commit intomasterfrom
perf/fft

Conversation

@gbotrel
Copy link
Copy Markdown
Collaborator

@gbotrel gbotrel commented Jan 8, 2026

Description

Improve small size fft runtime (~50%) by avoiding overhead for small sizes (< 1024) .


Note

Optimizes small FFT sizes by bypassing recursion for n=512 and n=1024.

  • Add kerDIFNP_512/kerDITNP_512 and kerDIFNP_1024/kerDITNP_1024 in field/babybear/fft/fft.go and field/koalabear/fft/fft.go
  • Update difFFT/ditFFT to directly call these kernels when stage >= twiddlesStartStage and n ∈ {512,1024}
  • Extend generator template internal/generator/field/template/fft/fft.go.tmpl to emit the same kernels and conditional dispatch (guarded by HasASMKernel)

Written by Cursor Bugbot for commit 21ae715. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings January 8, 2026 19:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves small FFT performance (~50% for sizes < 1024) by adding specialized kernels for 512 and 1024-element FFTs that avoid recursion overhead. The changes introduce new optimized paths that perform outer butterfly stages inline before delegating to existing 256-element kernels.

  • Adds dedicated kernels (kerDIFNP_512, kerDITNP_512, kerDIFNP_1024, kerDITNP_1024) for both DIF and DIT FFT algorithms
  • Updates dispatch logic to route 512 and 1024-element FFTs to specialized kernels when ASM support is available
  • Generated code applied to babybear and koalabear field implementations

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
internal/generator/field/template/fft/fft.go.tmpl Adds template code for 512/1024 element kernels and updates dispatch logic with new size checks (conditional on HasASMKernel)
field/koalabear/fft/fft.go Generated code: adds four new kernel functions and updates difFFT/ditFFT dispatch to check for n==512 and n==1024
field/babybear/fft/fft.go Generated code: adds four new kernel functions and updates difFFT/ditFFT dispatch to check for n==512 and n==1024

@gbotrel gbotrel merged commit 95ff899 into master Jan 9, 2026
20 checks passed
@gbotrel gbotrel deleted the perf/fft branch January 9, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants