Skip to content

feat: push down IS NULL / IS NOT NULL on struct columns into Parquet scan#21796

Open
Druva-D wants to merge 1 commit intoapache:mainfrom
Druva-D:nested_struct_null_pushdown
Open

feat: push down IS NULL / IS NOT NULL on struct columns into Parquet scan#21796
Druva-D wants to merge 1 commit intoapache:mainfrom
Druva-D:nested_struct_null_pushdown

Conversation

@Druva-D
Copy link
Copy Markdown

@Druva-D Druva-D commented Apr 23, 2026

Which issue does this PR close?

Rationale for this change

IS NULL / IS NOT NULL on struct columns is blanket-rejected from Parquet row filter pushdown, forcing all leaf columns to be materialized post-scan just to check nullability. In Parquet, definition levels encode struct nullability independently — arrow-rs reconstructs the struct's null bitmap from any single leaf. This PR exploits that to push down struct null checks while reading only one leaf column.

Scenario No Pushdown With Pushdown Speedup
SELECT id WHERE s IS NOT NULL 749ms 63ms 11.9x
SELECT * WHERE s IS NOT NULL 733ms 1040ms 0.7x

The speedup applies when the struct is filtered but not projected. SELECT * shows no benefit since all leaves are read for the output anyway.

What changes are included in this PR?

  • Intercept IS NULL(Column(struct)) / IS NOT NULL(Column(struct)) in PushdownChecker::f_down before the Column node triggers the blanket struct rejection
  • Resolve null checks to only the first Parquet leaf via resolve_struct_null_check_leaves(). Merge the first leaf path into field access paths in build_filter_schema to avoid schema/mask mismatch when combined with get_field in OR expressions
  • Renamed struct_data_structures_prevent_pushdownstruct_is_not_null_allows_pushdown (assertion flipped — struct null checks are now supported)

Are these changes tested?

Yes — 10 unit tests (pushdown acceptance, correctness with all-null leaves, nested structs, OR expressions with combined null check + field access, NOT wrapping), plus an integration test verifying pushdown_rows_pruned/pushdown_rows_matched metrics through the full SessionContext pipeline.

Tests generated with the help of Claude Code

Are there any user-facing changes?

No. Struct null check pushdown activates automatically when pushdown_filters is enabled.

@github-actions github-actions Bot added core Core DataFusion crate datasource Changes to the datasource crate labels Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Push down IS NULL / IS NOT NULL on struct columns into Parquet scan

1 participant