Commit 9660c98
perf: Use zero-copy slice instead of take kernel in sort merge join (#20463)
## Summary
Follows on from #20464 which
adds new criterion benchmarks.
- When the join indices form a contiguous ascending range (e.g.
`[3,4,5,6]`), replace the O(n) Arrow `take` kernel with O(1)
`RecordBatch::slice` (zero-copy pointer arithmetic)
- Applies to both the streamed (left) and buffered (right) sides of the
sort merge join
## Rationale
In SMJ, the streamed side cursor advances sequentially, so its indices
are almost always contiguous. The buffered side is scanned sequentially
within each key group, so its indices are also contiguous for 1:1 and
1:few joins. The `take` kernel allocates new arrays and copies data even
when a simple slice would suffice.
## Benchmark Results
Criterion micro-benchmark (100K rows, pre-sorted, no sort/scan
overhead):
| Benchmark | Baseline | Optimized | Improvement |
|-----------|----------|-----------|-------------|
| inner_1to1 (unique keys) | 5.11 ms | 3.88 ms | **-24%** |
| inner_1to10 (10K keys) | 17.64 ms | 16.29 ms | **-8%** |
| left_1to1_unmatched (5% unmatched) | 4.80 ms | 3.87 ms | **-19%** |
| left_semi_1to10 (10K keys) | 3.65 ms | 3.11 ms | **-15%** |
| left_anti_partial (partial match) | 3.58 ms | 3.43 ms | **-4%** |
All improvements are statistically significant (p < 0.05).
TPC-H SF1 with SMJ forced (`prefer_hash_join=false`) shows no
regressions across all 22 queries, with modest end-to-end improvements
on join-heavy queries (Q3 -7%, Q19 -5%, Q21 -2%).
## Implementation
- `is_contiguous_range()`: checks if a `UInt64Array` is a contiguous
ascending range. Uses quick endpoint rejection then verifies every
element sequentially.
- `freeze_streamed()`: uses `slice` instead of `take` for streamed
(left) columns when indices are contiguous.
- `fetch_right_columns_from_batch_by_idxs()`: uses `slice` instead of
`take` for buffered (right) columns when indices are contiguous.
When indices are not contiguous (e.g. repeated indices in many-to-many
joins), falls back to the existing `take` path.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent bfc012e commit 9660c98
1 file changed
Lines changed: 49 additions & 17 deletions
Lines changed: 49 additions & 17 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
53 | 52 | | |
54 | 53 | | |
55 | 54 | | |
56 | | - | |
57 | | - | |
| 55 | + | |
58 | 56 | | |
59 | 57 | | |
60 | 58 | | |
| |||
1248 | 1246 | | |
1249 | 1247 | | |
1250 | 1248 | | |
1251 | | - | |
1252 | | - | |
1253 | | - | |
1254 | | - | |
1255 | | - | |
1256 | | - | |
1257 | | - | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
1258 | 1262 | | |
1259 | 1263 | | |
1260 | 1264 | | |
| |||
1577 | 1581 | | |
1578 | 1582 | | |
1579 | 1583 | | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
1580 | 1608 | | |
1581 | 1609 | | |
1582 | 1610 | | |
| |||
1597 | 1625 | | |
1598 | 1626 | | |
1599 | 1627 | | |
1600 | | - | |
1601 | | - | |
1602 | | - | |
1603 | | - | |
1604 | | - | |
1605 | | - | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
1606 | 1638 | | |
1607 | 1639 | | |
1608 | 1640 | | |
| |||
0 commit comments