Is your feature request related to a problem or challenge?
IN list filters have become critical-path operations for dynamic filter pushdown. When scanning large tables with partition pruning or dynamic filters, the IN expression is evaluated millions of times per query. The current generic implementation leaves significant performance on the table.
Describe the solution you'd like
Implement type-specialized StaticFilter implementations that exploit the properties of different data types to achieve substantial performance improvements.
Optimization Strategies
1. Bitmap Filters for Small Integer Types
For 1-byte and 2-byte types, use fixed-size bitmaps for O(1) membership testing:
- UInt8/Int8: 256-bit bitmap (32 bytes, stack-allocated, fits in cache line)
- UInt16/Int16: 65536-bit bitmap (8KB, heap-allocated, fits in L1 cache)
2. Const-Generic Branchless Evaluation
For small IN lists on larger primitives, use compile-time-known array sizes:
- Loop unrolling via
fold(false, |acc, &v| acc | (v == needle))
- Branch elimination (no conditional jumps)
- Thresholds: ≤32 for 4-byte, ≤16 for 8-byte, ≤4 for 16-byte types
3. Direct Probe Hash Filter
For larger primitive lists, use a custom open-addressing hash table:
- Linear probing with 25% load factor
- Golden-ratio multiply hash function
- Direct array storage (no indirection like
std::HashSet)
4. Type Normalization (Zero-Copy)
Normalize types via buffer reinterpretation (only bit patterns matter for equality):
- Signed integers → Unsigned (
Int32 → UInt32)
- Floats → Unsigned (
Float32 → UInt32)
- Timestamp/Date/Duration → underlying integer type
5. Two-Stage String Filters
- Stage 1: O(1) rejection using length + prefix encoded as i128
- Stage 2: Full verification only for long string candidates
Filter Selection Strategy
instantiate_static_filter(array)
│
├─ Utf8View + all_short_strings? ──────────► Branchless/Hash i128
│
├─ primitive_width=1 ──────────────────────► Bitmap (32 bytes)
├─ primitive_width=2 ──────────────────────► Bitmap (8 KB)
│
├─ primitive_width=4
│ ├─ len ≤ 32 ────────────────────────────► Branchless<UInt32, N>
│ └─ len > 32 ────────────────────────────► DirectProbeFilter<UInt32>
│
├─ primitive_width=8
│ ├─ len ≤ 16 ────────────────────────────► Branchless<UInt64, N>
│ └─ len > 16 ────────────────────────────► DirectProbeFilter<UInt64>
│
├─ primitive_width=16
│ ├─ len ≤ 4 ─────────────────────────────► Branchless<Decimal128, N>
│ └─ len > 4 ─────────────────────────────► DirectProbeFilter<Decimal128>
│
├─ Utf8 / LargeUtf8 ───────────────────────► Utf8TwoStageFilter
├─ Utf8View / BinaryView ──────────────────► ByteViewMaskedFilter
│
└─ Everything else ────────────────────────► NestedTypeFilter (fallback)
Benchmark Results
Benchmarks from the reference implementation (see #19390 ):
| Category |
Representative Benchmark |
Speedup |
| Bitmap (1-byte) |
u8/list=16/match=50% |
8.5× |
| Bitmap (2-byte) |
i16/list=64/match=50% |
8.5× |
| Branchless (4-byte) |
i32/list=4/match=50% |
10.4× |
| Branchless (8-byte) |
i64/list=4/match=50% |
11.1× |
| Branchless (timestamp) |
timestamp_ns/list=4/match=50% |
13.6× |
| Hash (large lists) |
i32/hash/list=256/match=50% |
2.4× |
| Utf8View (short) |
utf8view/short_8b/nulls=50% |
6.3× |
| Utf8View (boundary) |
utf8view/boundary_12b/match=50% |
7.0× |
| Utf8 (short) |
utf8/short_8b/list=256/match=50% |
2.3× |
| Realistic |
port_numbers/pool=1000/filter=10 |
3.9× |
| Realistic |
status_codes/nulls=20% |
4.8× |
| Realistic |
email_addresses/list=20 |
3.0× |
Implementation Plan
A reference implementation is available in #19390 . To facilitate review, it will be submitted as a series of incremental PRs:
PR 1: Comprehensive Benchmark Suite
PR 2: Modular StaticFilter Architecture
PR 3: Bitmap Filter for UInt8 (stack-based)
PR 4: Bitmap Filter for UInt16 (heap-based)
PR 5: Zero-Copy Type Reinterpretation
PR 6: Branchless Filter for Small Primitive Lists
PR 7: Direct Probe Hash Filter for Large Primitive Lists
PR 8: ByteView Two-Stage Filter (Utf8View/BinaryView)
PR 9: Utf8/LargeUtf8 Two-Stage Filter
Additional Context
Is your feature request related to a problem or challenge?
IN list filters have become critical-path operations for dynamic filter pushdown. When scanning large tables with partition pruning or dynamic filters, the IN expression is evaluated millions of times per query. The current generic implementation leaves significant performance on the table.
Describe the solution you'd like
Implement type-specialized
StaticFilterimplementations that exploit the properties of different data types to achieve substantial performance improvements.Optimization Strategies
1. Bitmap Filters for Small Integer Types
For 1-byte and 2-byte types, use fixed-size bitmaps for O(1) membership testing:
2. Const-Generic Branchless Evaluation
For small IN lists on larger primitives, use compile-time-known array sizes:
fold(false, |acc, &v| acc | (v == needle))3. Direct Probe Hash Filter
For larger primitive lists, use a custom open-addressing hash table:
std::HashSet)4. Type Normalization (Zero-Copy)
Normalize types via buffer reinterpretation (only bit patterns matter for equality):
Int32→UInt32)Float32→UInt32)5. Two-Stage String Filters
Filter Selection Strategy
Benchmark Results
Benchmarks from the reference implementation (see #19390 ):
Implementation Plan
A reference implementation is available in #19390 . To facilitate review, it will be submitted as a series of incremental PRs:
PR 1: Comprehensive Benchmark Suite
benches/in_list_optimizations.rscovering all optimization pathsPR 2: Modular StaticFilter Architecture
InListExprto use trait-basedStaticFilterarchitectureNestedTypeFilteras fallback for complex typesPR 3: Bitmap Filter for UInt8 (stack-based)
BitmapFilter<U8Config>(stack-allocated, 32 bytes)PR 4: Bitmap Filter for UInt16 (heap-based)
BitmapFilter<U16Config>(heap-allocated, 8KB)PR 5: Zero-Copy Type Reinterpretation
reinterpret_any_primitive_to<T>()for buffer reinterpretationPR 6: Branchless Filter for Small Primitive Lists
BranchlessFilter<T, N>with const-generic sizes (0-32)ReinterpretedBranchlessfor type normalizationPR 7: Direct Probe Hash Filter for Large Primitive Lists
DirectProbeFilter<T>with open-addressing hash tableDirectProbeHashabletrait with golden-ratio hashPR 8: ByteView Two-Stage Filter (Utf8View/BinaryView)
ByteViewMaskedFilter<T>for Utf8View/BinaryViewPR 9: Utf8/LargeUtf8 Two-Stage Filter
Utf8TwoStageFilter<O>for Utf8/LargeUtf8Additional Context
StaticFiltersfor different data types #18824 (Restore IN_LIST performance)