Commit e78b931
authored
feat(arrow/compute): sort support (#749)
## Summary
Implements stable `sort_indices` (and `sort` via `take`) for arrays,
chunked arrays, record batches, and tables using logical row indices
over `Chunked` data **without concatenating chunks**. The control flow
and ordering rules are modeled on Apache Arrow C++ **`vector_sort.cc` /
`vector_sort_internal.h`**, with a few **Go- and performance-driven**
differences called out below.
## Parity with Arrow C++ (`vector_sort.cc` / `vector_sort_internal.h`)
**Same overall structure**
- **Single sort key, one column**
- **Multiple chunks:** per-chunk sort then **pairwise merge** of sorted
spans (C++ **ChunkedArraySorter** / **ChunkedMergeImpl** idea).
- **Single chunk, no validity nulls and no null-likes:** direct stable
sort on indices (C++ skips null partitioning when `null_count == 0` and
there are no null-likes).
- Otherwise: **partition validity nulls**, **partition float null-likes
(NaN)**, stable sort of finite values, then
**VisitConstantRanges**-style handling of ties
(`vector_sort_internal.go`).
- **Multiple sort keys**
- **`len(keys) <= kMaxRadixSortKeys` (8):** **MSD radix** path per
record-batch range (`radixRecordBatchSortRange` ↔
**ConcreteRecordBatchColumnSorter::SortRange**).
- **More than 8 keys:** **MultipleKeyRecordBatchSorter**-style global
stable sort with lexicographic compare across keys
(`multipleKeyRecordBatchSortRange`).
- **Aligned chunk boundaries** across all keyed columns (typical table):
sort **each chunk slice** with the same strategy, then **merge spans**
like C++ **TableSorter** batch merge.
**Same ordering semantics (intended match to C++)**
- Per-key **ascending / descending** and **null placement** (including
**NaN** as null-like for floats).
- **Stable** ordering: merge and `slices.SortStableFunc` are used so
tie-breaking matches the C++ “left before right” stable merge behavior
where documented in code.
**Same “column comparator” role**
- Go **`columnComparator`** interface ↔ C++ **`ColumnComparator`**:
`compareRowsForKey`, null / null-like metadata, `columnHasValidityNulls`
(skip **PartitionNullsOnly** when there are no validity nulls).
**Physical types**
- One **monomorphic** comparator type per supported physical pattern in
**`vector_sort_physical.go`**, analogous to C++
**`ConcreteColumnComparator<T>`** (concrete `*array.T` + direct `Value`
/ `Cmp` / special cases for bool and intervals).
## Intentional differences and rationale
| Area | C++ | This Go port |
| ---------------------------------------------- |
-------------------------------------------------------------------------------------
|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| **Resolving logical row → (chunk, offset)** | Chunk / resolver
machinery in C++ | **Dense `logicalRowMap`**: one `rowMapCell{chunk,
local}` per logical row when `len(chunks) > 1`; **`pair(i,j)`** resolves
two rows in one shot. **Why:** random compares during sort/merge need
O(1) resolution; a flat table + co-located fields beats repeated
resolver work and improves locality vs separate `chunk`/`local` slices.
|
| **`physicalColumnBase` methods** | N/A (different language) |
**Pointer receivers** on `pair` / `isNullAtGlobal` / `cell`. **Why:**
value receivers would copy slice headers (and map state) on every
compare. |
| **Stable sort primitive** | `std::stable_sort` |
**`slices.SortStableFunc`** (Go 1.21+). **Why:** library primitive;
semantics aligned with stable weak ordering used elsewhere in the port.
|
| **Column dispatch at runtime** | Templates + virtuals |
**`columnComparator` interface** for “which column” in multi-key and
merge loops. **Why:** idiomatic Go; per-type work stays in concrete
`compareRowsForKey` implementations. |
| **Chunked merge with null-likes (e.g. float)** | C++ can **split**
merge for null-like vs non-null-like regions (**ChunkedMergeImpl**) |
**Single `less` over full row order** after per-chunk partitioning/sort.
**Why:** simpler merge while preserving order as long as per-chunk
phases match C++; documented in `vector_sort.go` comments. |
| **Generics for physical columns** | Templates instantiate fully |
**Explicit monomorphs only** for the hot compare path. **Why:** measured
regression vs Go generics on this hot path (inlining / assertions);
verbosity traded for performance. |
## File Layout
- `arrow/compute/vector_sort.go` — `sort_indices` / `sort` registration
and datum dispatch.
- `arrow/compute/vector_sort_test.go` — functional tests.
- `arrow/compute/internal/kernels/vector_sort.go` — orchestration,
merge, `SortIndices` kernel.
- `arrow/compute/internal/kernels/vector_sort_internal.go` — null
partitions, radix / multi-key batch sort.
- `arrow/compute/internal/kernels/vector_sort_support.go` —
`logicalRowMap` and ordering helpers.
- `arrow/compute/internal/kernels/vector_sort_physical.go` — per-type
column comparators.
- `arrow/compute/internal/kernels/vector_sort_bench_test.go` —
benchmarks.
## Testing
- `go test ./arrow/compute -run TestSort -count=1`
- Benchmarks: `go test ./arrow/compute/internal/kernels
-bench=BenchmarkSortIndices -benchmem` .
## References
- Arrow C++: `cpp/src/arrow/compute/kernels/vector_sort.cc` and
`vector_sort_internal.h` (and related comparators).
-
https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/vector_sort.cc
-
https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/vector_sort_internal.h
## Related Issues
- Closes #661 parent ea84305 commit e78b931
9 files changed
Lines changed: 4207 additions & 1 deletion
Large diffs are not rendered by default.
Lines changed: 175 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
0 commit comments