make sort32 fast by 39ali · Pull Request #327 · sparkjsdev/spark

39ali · 2026-04-29T08:55:21Z

try to improve the performance of sort32, on avg it's 30-40% faster .

things that changed :

pass 2 no longer re-reads keys[] , scratch stores a packed u64 of (inverted_key << 32 | original_index). pass 2 reads the high 16 bits directly from scratch with kv >> 48 making it a sequential scan
histogram and scatter are now branchless to help llvm vectorize the loop
manually unrolled histogram and both scatter passes to 8-wide

mrxz · 2026-04-29T14:26:17Z

-/// Two‑pass radix sort (base 2¹⁶) of 32‑bit float bit‑patterns,
-/// descending order (largest keys first). Mirrors the JS `sort32Splats`.
+#[inline(always)]
+unsafe fn prefix_sum_exclusive(buckets: &mut [u32]) -> u32 {


Is there a specific reason this is marked unsafe? It compiles just fine without.

i had many experiments with simd, which didn't make it marginally faster so i removed it for simplicity sake but forgot to remove the unsafe, will clean it up

mrxz · 2026-04-29T14:50:14Z

Awesome work, gave it a try and can confirm that it improves sorting performance. In my limited testing I saw ~20% reduction in sorting time (~25% faster).

manually unrolled histogram and both scatter passes to 8-wide

Without this change the performance gain seems to be roughly the same, or at least I didn't observe any significant difference. The majority of the benefit seems to come from making it branchless.

39ali · 2026-04-29T17:36:25Z

@mrxz i squeezed a bit more performance ~<=1ms by removing more branches from hot loops, and what you noticed seems about right, it will differ from one wasm engine to another, and arch to another(specially cache sizes and arch) so it's hard to give a solid number but it'll still be a pump in performance

make sort32 faster

8f3c722

mrxz reviewed Apr 29, 2026

View reviewed changes

39ali force-pushed the sort32-fast branch from 30273c2 to 7d1029c Compare April 29, 2026 17:40

remove more branches in wasm hot loops

77253d1

39ali force-pushed the sort32-fast branch from 8c1efb4 to 77253d1 Compare April 29, 2026 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make sort32 fast#327

make sort32 fast#327
39ali wants to merge 2 commits intosparkjsdev:mainfrom
39ali:sort32-fast

39ali commented Apr 29, 2026 •

edited

Loading

Uh oh!

mrxz Apr 29, 2026

Uh oh!

39ali Apr 29, 2026

Uh oh!

mrxz commented Apr 29, 2026

Uh oh!

39ali commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

39ali commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrxz Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

39ali Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

mrxz commented Apr 29, 2026

Uh oh!

39ali commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

39ali commented Apr 29, 2026 •

edited

Loading

39ali commented Apr 29, 2026 •

edited

Loading