perf: Skip RowFilter when all predicate columns are in the projection by darmie · Pull Request #20417 · apache/datafusion

darmie · 2026-02-17T22:25:28Z

Which issue does this PR close?

Closes part of [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) #20324 (addresses the "filter columns ⊆ projection columns" category of regressions).
Related: ClickBench Q10 slows down when filter pushdown is enabled #20325 (Q10 investigation)

Rationale for this change

When pushdown_filters = true and all predicate columns are already in the output projection, the arrow-rs RowFilter (late materialization) machinery provides zero I/O benefit — those columns must be decoded for the projection anyway. Yet the RowFilter adds substantial CPU overhead from CachedArrayReader, ReadPlanBuilder::with_predicate, and ParquetDecoderState::try_next_batch (~1100 extra CPU samples on Q10 flamegraph). This causes regressions on 15 of the 43 ClickBench queries.

See profiling details.

What changes are included in this PR?

In opener.rs, before calling build_row_filter(), check whether all predicate column indices are a subset of the projection column indices. If so:

Skip build_row_filter() entirely (no RowFilter overhead)
Apply the predicate as a vectorized batch filter post-decode using batch_filter()
Filter out empty batches from the stream

If not a subset (i.e., there are non-projected columns that could be skipped), proceed with the RowFilter path as before.

ClickBench results on key regression queries (pushdown ON, fix vs baseline):

Q19: 0.46x vs baseline (fully fixed — faster than pushdown OFF)
Q26: 0.53x vs baseline (fully fixed)
Q10, Q11, Q25: 12-19% improvement vs baseline

Are these changes tested?

Yes. Added test_skip_row_filter_when_filter_cols_subset_of_projection which validates:

Batch filter path (filter cols ⊆ projection): correct row counts and values
RowFilter path (filter cols ⊄ projection): correct filtered values
Batch filter with no matches: 0 rows, 0 batches (empty batches filtered)

All existing tests pass (81 tests in datafusion-datasource-parquet).

Are there any user-facing changes?

No. Behavior is identical — queries return the same results. Performance improves for queries where filter columns overlap with projection columns when pushdown_filters = true.

When all predicate columns are in the output projection, late materialization provides no I/O benefit. Replace the expensive RowFilter path with a lightweight batch-level filter to avoid CachedArrayReader/ReadPlanBuilder/try_next_batch overhead.

Add a dedicated test verifying that when all predicate columns are in the output projection, the opener skips RowFilter and applies a batch filter instead — and that both the batch filter and RowFilter paths produce correct results. Simplify the 4-way stream branching into two independent steps: first apply the empty-batch filter, then optionally wrap with EarlyStoppingStream.

Skip dynamic filter expressions (TopK, join pushdown) when deciding whether a predicate is single-conjunct. This preserves the batch filter optimization for queries like Q25 (WHERE col <> '' ORDER BY col LIMIT N) where TopK adds runtime conjuncts, while still routing multi-conjunct static predicates through RowFilter for incremental evaluation.

Dandandan · 2026-02-18T05:55:43Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-18T05:55:51Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (d7ff890) to 468b690 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-18T06:26:00Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.73 ms │                     2.74 ms │     no change │
│ QQuery 1  │    53.99 ms │                    52.67 ms │     no change │
│ QQuery 2  │   131.93 ms │                   133.78 ms │     no change │
│ QQuery 3  │   159.93 ms │                   157.39 ms │     no change │
│ QQuery 4  │  1038.49 ms │                  1023.11 ms │     no change │
│ QQuery 5  │  1301.27 ms │                  1322.56 ms │     no change │
│ QQuery 6  │    18.18 ms │                    17.52 ms │     no change │
│ QQuery 7  │    66.92 ms │                    53.90 ms │ +1.24x faster │
│ QQuery 8  │  1402.44 ms │                  1404.75 ms │     no change │
│ QQuery 9  │  1848.46 ms │                  1843.40 ms │     no change │
│ QQuery 10 │   472.82 ms │                   355.89 ms │ +1.33x faster │
│ QQuery 11 │   532.90 ms │                   422.79 ms │ +1.26x faster │
│ QQuery 12 │  1365.85 ms │                  1202.05 ms │ +1.14x faster │
│ QQuery 13 │  2061.40 ms │                  1873.72 ms │ +1.10x faster │
│ QQuery 14 │  1394.49 ms │                  1244.36 ms │ +1.12x faster │
│ QQuery 15 │  1159.31 ms │                  1207.16 ms │     no change │
│ QQuery 16 │  2491.28 ms │                  2509.99 ms │     no change │
│ QQuery 17 │  2404.69 ms │                  2480.71 ms │     no change │
│ QQuery 18 │  4550.11 ms │                  4641.62 ms │     no change │
│ QQuery 19 │   140.88 ms │                   120.08 ms │ +1.17x faster │
│ QQuery 20 │  1878.13 ms │                  1848.48 ms │     no change │
│ QQuery 21 │  2286.10 ms │                  2172.65 ms │     no change │
│ QQuery 22 │  3952.49 ms │                  3772.31 ms │     no change │
│ QQuery 23 │  1098.44 ms │                  5938.64 ms │  5.41x slower │
│ QQuery 24 │   246.83 ms │                   219.46 ms │ +1.12x faster │
│ QQuery 25 │   635.97 ms │                   456.41 ms │ +1.39x faster │
│ QQuery 26 │   343.40 ms │                   232.27 ms │ +1.48x faster │
│ QQuery 27 │  2967.57 ms │                  2446.80 ms │ +1.21x faster │
│ QQuery 28 │ 24025.61 ms │                 24439.56 ms │     no change │
│ QQuery 29 │   978.89 ms │                   967.36 ms │     no change │
│ QQuery 30 │  1286.45 ms │                  1288.92 ms │     no change │
│ QQuery 31 │  1343.48 ms │                  1320.07 ms │     no change │
│ QQuery 32 │  4379.76 ms │                  4069.56 ms │ +1.08x faster │
│ QQuery 33 │  4966.18 ms │                  5058.23 ms │     no change │
│ QQuery 34 │  5306.38 ms │                  5682.03 ms │  1.07x slower │
│ QQuery 35 │  1826.82 ms │                  1838.97 ms │     no change │
│ QQuery 36 │   179.55 ms │                   183.06 ms │     no change │
│ QQuery 37 │    86.27 ms │                    88.65 ms │     no change │
│ QQuery 38 │    87.89 ms │                    88.64 ms │     no change │
│ QQuery 39 │   275.95 ms │                   278.41 ms │     no change │
│ QQuery 40 │    55.55 ms │                    60.25 ms │  1.08x slower │
│ QQuery 41 │    50.36 ms │                    49.59 ms │     no change │
│ QQuery 42 │    35.69 ms │                    38.21 ms │  1.07x slower │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 80891.84ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 84608.72ms │
│ Average Time (HEAD)                        │  1881.21ms │
│ Average Time (fix-parquet-filter-pushdown) │  1967.64ms │
│ Queries Faster                             │         12 │
│ Queries Slower                             │          4 │
│ Queries with No Change                     │         27 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Dandandan · 2026-02-18T07:03:38Z

run benchmark tpch
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-18T07:03:43Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (d7ff890) to 468b690 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2026-02-18T07:04:33Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 186.45 ms │                   185.05 ms │     no change │
│ QQuery 2  │  95.00 ms │                    88.27 ms │ +1.08x faster │
│ QQuery 3  │ 170.08 ms │                   116.11 ms │ +1.46x faster │
│ QQuery 4  │ 135.47 ms │                   137.20 ms │     no change │
│ QQuery 5  │ 292.90 ms │                   293.04 ms │     no change │
│ QQuery 6  │ 203.81 ms │                   204.16 ms │     no change │
│ QQuery 7  │ 259.28 ms │                   222.75 ms │ +1.16x faster │
│ QQuery 8  │ 309.29 ms │                   256.97 ms │ +1.20x faster │
│ QQuery 9  │ 411.56 ms │                   306.49 ms │ +1.34x faster │
│ QQuery 10 │ 275.71 ms │                   266.22 ms │     no change │
│ QQuery 11 │  73.77 ms │                    67.02 ms │ +1.10x faster │
│ QQuery 12 │ 257.27 ms │                   258.47 ms │     no change │
│ QQuery 13 │ 210.86 ms │                   214.71 ms │     no change │
│ QQuery 14 │ 108.99 ms │                   113.14 ms │     no change │
│ QQuery 15 │ 196.28 ms │                   194.45 ms │     no change │
│ QQuery 16 │  78.51 ms │                    65.16 ms │ +1.20x faster │
│ QQuery 17 │ 225.58 ms │                   239.66 ms │  1.06x slower │
│ QQuery 18 │ 479.65 ms │                   483.12 ms │     no change │
│ QQuery 19 │ 148.75 ms │                   157.90 ms │  1.06x slower │
│ QQuery 20 │ 150.52 ms │                   157.63 ms │     no change │
│ QQuery 21 │ 340.50 ms │                   340.16 ms │     no change │
│ QQuery 22 │  64.36 ms │                    61.32 ms │     no change │
└───────────┴───────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 4674.57ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 4429.01ms │
│ Average Time (HEAD)                        │  212.48ms │
│ Average Time (fix-parquet-filter-pushdown) │  201.32ms │
│ Queries Faster                             │         7 │
│ Queries Slower                             │         2 │
│ Queries with No Change                     │        13 │
│ Queries with Failure                       │         0 │
└────────────────────────────────────────────┴───────────┘

Dandandan · 2026-02-18T07:05:52Z

run benchmark tpcds
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-18T07:06:01Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (d7ff890) to 468b690 diff using: tpcds
Results will be posted here when complete

Dandandan · 2026-02-18T08:30:34Z

show benchmark queue

alamb-ghbot · 2026-02-18T08:30:36Z

🤖 Hi @Dandandan, you asked to view the benchmark queue (#20417 (comment)).

Job	User	Benchmarks	Comment
`20417_3919093898.sh`	Dandandan	tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20417#issuecomment-3919093898`

Change is_subset to strict equality for predicate vs projection column indices. When there are non-predicate projection columns (e.g. SELECT * WHERE col = X), RowFilter provides significant value by skipping their decode for non-matching rows. Only skip RowFilter when every projected column is a predicate column. Also exclude dynamic filter expressions (TopK, join pushdown) when counting conjuncts, so runtime-generated filters don't prevent the batch filter optimization for single static predicates.

Dandandan · 2026-02-18T10:00:14Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

Dandandan · 2026-02-18T14:30:33Z

show benchmark queue

alamb-ghbot · 2026-02-18T14:30:43Z

🤖 Hi @Dandandan, you asked to view the benchmark queue (#20417 (comment)).

Job	User	Benchmarks	Comment
`20417_3919093898.sh`	Dandandan	tpcds (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20417#issuecomment-3919093898`
`20417_3919839904.sh`	Dandandan	clickbench_partitioned (env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true)	`https://github.com/apache/datafusion/pull/20417#issuecomment-3919839904`

Dandandan · 2026-02-18T14:44:15Z

@alamb looks like the runner isn't working

darmie · 2026-02-18T17:57:11Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-18T17:57:21Z

🤖 Hi @darmie, thanks for the request (#20417 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Jefffrey, Omega359, adriangb, alamb, comphead, etseidl, gabotechs, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

Dandandan · 2026-02-19T07:02:36Z

I'll run the benchmark locally (./bench.sh run clickbench_pushdown) and report back!

Copilot

Pull request overview

This PR optimizes Parquet filter pushdown by skipping the RowFilter (late materialization) path when it provides no I/O benefit, addressing performance regressions identified in ClickBench queries.

Purpose:
The optimization recognizes that when all predicate columns must be decoded for the output projection anyway (and there's at most one static conjunct), the RowFilter machinery adds CPU overhead without providing I/O savings. In these cases, applying the predicate as a post-decode batch filter is more efficient.

Changes:

Added logic to detect when predicate columns exactly match projection columns with ≤1 static conjunct
Implemented batch-level filtering as an alternative to RowFilter in these cases
Added empty batch filtering to remove batches with no rows after filtering
Comprehensive test coverage for various predicate/projection combinations including multi-conjunct predicates and dynamic filters

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File	Description
datafusion/datasource-parquet/src/opener.rs	Core optimization logic to skip RowFilter when predicate columns match projection columns, apply batch-level filtering, and filter empty batches. Includes comprehensive test suite (Cases 1-6) validating different predicate/projection scenarios.
.gitignore	Added profiling-artifacts/ directory to ignore profiling outputs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Dandandan · 2026-02-19T11:00:36Z

--------------------
Benchmark clickbench_pushdown.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        main ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.22 ms │                     1.01 ms │ +1.20x faster │
│ QQuery 1  │    16.97 ms │                    16.88 ms │     no change │
│ QQuery 2  │    56.67 ms │                    62.78 ms │  1.11x slower │
│ QQuery 3  │    54.60 ms │                    66.34 ms │  1.21x slower │
│ QQuery 4  │   438.98 ms │                   524.78 ms │  1.20x slower │
│ QQuery 5  │   590.83 ms │                   649.83 ms │  1.10x slower │
│ QQuery 6  │     6.06 ms │                     6.13 ms │     no change │
│ QQuery 7  │    22.96 ms │                    19.39 ms │ +1.18x faster │
│ QQuery 8  │   567.93 ms │                   566.83 ms │     no change │
│ QQuery 9  │   779.84 ms │                   759.31 ms │     no change │
│ QQuery 10 │   177.41 ms │                   171.28 ms │     no change │
│ QQuery 11 │   199.19 ms │                   196.96 ms │     no change │
│ QQuery 12 │   601.62 ms │                   523.99 ms │ +1.15x faster │
│ QQuery 13 │   810.95 ms │                   853.73 ms │  1.05x slower │
│ QQuery 14 │   571.92 ms │                   570.73 ms │     no change │
│ QQuery 15 │   512.93 ms │                   536.03 ms │     no change │
│ QQuery 16 │  1154.54 ms │                  1266.02 ms │  1.10x slower │
│ QQuery 17 │  1129.89 ms │                  1187.17 ms │  1.05x slower │
│ QQuery 18 │  2749.89 ms │                  3132.91 ms │  1.14x slower │
│ QQuery 19 │    71.38 ms │                    55.35 ms │ +1.29x faster │
│ QQuery 20 │  1039.04 ms │                  1119.51 ms │  1.08x slower │
│ QQuery 21 │  1095.05 ms │                  1360.84 ms │  1.24x slower │
│ QQuery 22 │  1963.60 ms │                  2676.42 ms │  1.36x slower │
│ QQuery 23 │   541.91 ms │                   437.29 ms │ +1.24x faster │
│ QQuery 24 │   154.14 ms │                   125.51 ms │ +1.23x faster │
│ QQuery 25 │   294.96 ms │                   201.84 ms │ +1.46x faster │
│ QQuery 26 │   178.71 ms │                   118.79 ms │ +1.50x faster │
│ QQuery 27 │  1472.81 ms │                  1503.21 ms │     no change │
│ QQuery 28 │ 12007.56 ms │                 12538.39 ms │     no change │
│ QQuery 29 │   383.20 ms │                   451.01 ms │  1.18x slower │
│ QQuery 30 │   529.90 ms │                   556.10 ms │     no change │
│ QQuery 31 │   521.25 ms │                   560.14 ms │  1.07x slower │
│ QQuery 32 │  3094.35 ms │                  4033.04 ms │  1.30x slower │
│ QQuery 33 │  3140.57 ms │                  3956.19 ms │  1.26x slower │
│ QQuery 34 │  4863.90 ms │                  4918.10 ms │     no change │
│ QQuery 35 │   969.41 ms │                   969.86 ms │     no change │
│ QQuery 36 │    96.56 ms │                    98.31 ms │     no change │
│ QQuery 37 │    58.77 ms │                    54.30 ms │ +1.08x faster │
│ QQuery 38 │    56.74 ms │                    56.13 ms │     no change │
│ QQuery 39 │   178.06 ms │                   165.72 ms │ +1.07x faster │
│ QQuery 40 │    31.91 ms │                    31.74 ms │     no change │
│ QQuery 41 │    28.49 ms │                    29.08 ms │     no change │
│ QQuery 42 │    20.52 ms │                    19.05 ms │ +1.08x faster │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘

Dandandan · 2026-02-19T11:01:12Z

│ QQuery 21 │  1095.05 ms │                  1360.84 ms │  1.24x slower │
│ QQuery 22 │  1963.60 ms │                  2676.42 ms │  1.36x slower │
│ QQuery 32 │  3094.35 ms │                  4033.04 ms │  1.30x slower │
│ QQuery 33 │  3140.57 ms │                  3956.19 ms │  1.26x slower │

It reports some bigger slowdowns 🤔

alamb-ghbot · 2026-02-26T00:32:56Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.53 ms │                     2.60 ms │     no change │
│ QQuery 1  │    51.97 ms │                        FAIL │  incomparable │
│ QQuery 2  │   133.18 ms │                   139.95 ms │  1.05x slower │
│ QQuery 3  │   153.09 ms │                   147.70 ms │     no change │
│ QQuery 4  │  1002.83 ms │                  1010.27 ms │     no change │
│ QQuery 5  │  1259.08 ms │                  1306.70 ms │     no change │
│ QQuery 6  │    17.20 ms │                    15.82 ms │ +1.09x faster │
│ QQuery 7  │    66.92 ms │                    52.12 ms │ +1.28x faster │
│ QQuery 8  │  1415.65 ms │                  1426.57 ms │     no change │
│ QQuery 9  │  1740.92 ms │                  1782.75 ms │     no change │
│ QQuery 10 │   490.07 ms │                   482.02 ms │     no change │
│ QQuery 11 │   529.42 ms │                   534.37 ms │     no change │
│ QQuery 12 │  1396.40 ms │                  1181.55 ms │ +1.18x faster │
│ QQuery 13 │  2102.80 ms │                  2073.82 ms │     no change │
│ QQuery 14 │  1424.73 ms │                  1440.07 ms │     no change │
│ QQuery 15 │  1181.78 ms │                  1181.36 ms │     no change │
│ QQuery 16 │  2496.67 ms │                  2552.92 ms │     no change │
│ QQuery 17 │  2471.68 ms │                  2494.15 ms │     no change │
│ QQuery 18 │  5270.47 ms │                  4996.65 ms │ +1.05x faster │
│ QQuery 19 │   136.95 ms │                   119.58 ms │ +1.15x faster │
│ QQuery 20 │  1897.96 ms │                        FAIL │  incomparable │
│ QQuery 21 │  2314.06 ms │                  2340.19 ms │     no change │
│ QQuery 22 │  4024.36 ms │                  4024.95 ms │     no change │
│ QQuery 23 │  2043.83 ms │                  1092.83 ms │ +1.87x faster │
│ QQuery 24 │   247.72 ms │                   248.00 ms │     no change │
│ QQuery 25 │   624.93 ms │                   450.56 ms │ +1.39x faster │
│ QQuery 26 │   345.11 ms │                   319.66 ms │ +1.08x faster │
│ QQuery 27 │  3012.97 ms │                  2888.96 ms │     no change │
│ QQuery 28 │ 24607.57 ms │                 24422.90 ms │     no change │
│ QQuery 29 │   963.49 ms │                   979.15 ms │     no change │
│ QQuery 30 │  1294.82 ms │                  1287.51 ms │     no change │
│ QQuery 31 │  1442.21 ms │                  1329.67 ms │ +1.08x faster │
│ QQuery 32 │  4431.44 ms │                  4236.29 ms │     no change │
│ QQuery 33 │  5425.16 ms │                  5021.84 ms │ +1.08x faster │
│ QQuery 34 │  5588.35 ms │                  5488.63 ms │     no change │
│ QQuery 35 │  1899.77 ms │                  1845.12 ms │     no change │
│ QQuery 36 │   172.56 ms │                   158.75 ms │ +1.09x faster │
│ QQuery 37 │    93.49 ms │                    73.85 ms │ +1.27x faster │
│ QQuery 38 │    89.39 ms │                    88.74 ms │     no change │
│ QQuery 39 │   294.83 ms │                   275.65 ms │ +1.07x faster │
│ QQuery 40 │    58.83 ms │                    58.84 ms │     no change │
│ QQuery 41 │    48.28 ms │                    51.46 ms │  1.07x slower │
│ QQuery 42 │    36.88 ms │                    38.32 ms │     no change │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 82352.42ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 79662.85ms │
│ Average Time (HEAD)                        │  2008.60ms │
│ Average Time (fix-parquet-filter-pushdown) │  1943.00ms │
│ Queries Faster                             │         13 │
│ Queries Slower                             │          2 │
│ Queries with No Change                     │         26 │
│ Queries with Failure                       │          2 │
└────────────────────────────────────────────┴────────────┘

Dandandan · 2026-02-26T07:43:00Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-26T07:43:10Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (700abfb) to 468b690 diff using: clickbench_partitioned
Results will be posted here when complete

Dandandan · 2026-02-26T07:43:56Z

│ QQuery 23 │ 2043.83 ms │ 1092.83 ms │ +1.87x faster │

Looks like it is mostly better.

Incomparable looks like it runs OOM though, does it currently maybe skip some filters @darmie ?

darmie · 2026-02-26T07:44:48Z

│ QQuery 23 │ 2043.83 ms │ 1092.83 ms │ +1.87x faster │

Looks like it is mostly better.

Incomparable looks like it runs OOM though, does it currently maybe skip some filters @darmie ?

I'll investigate

When a conjunct references columns not in the output projection (e.g. COUNT(*) WHERE col = X), it cannot be evaluated as a batch filter because those columns are absent from the output schema. Keep such conjuncts in the RowFilter to avoid schema errors.

alamb-ghbot · 2026-02-26T08:12:14Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.55 ms │                     2.62 ms │     no change │
│ QQuery 1  │    51.87 ms │                        FAIL │  incomparable │
│ QQuery 2  │   130.75 ms │                   135.25 ms │     no change │
│ QQuery 3  │   156.95 ms │                   148.56 ms │ +1.06x faster │
│ QQuery 4  │   991.40 ms │                  1012.60 ms │     no change │
│ QQuery 5  │  1272.44 ms │                  1279.62 ms │     no change │
│ QQuery 6  │    17.98 ms │                    17.19 ms │     no change │
│ QQuery 7  │    65.67 ms │                    54.67 ms │ +1.20x faster │
│ QQuery 8  │  1350.19 ms │                  1416.04 ms │     no change │
│ QQuery 9  │  1704.98 ms │                  1763.74 ms │     no change │
│ QQuery 10 │   478.58 ms │                   481.80 ms │     no change │
│ QQuery 11 │   523.22 ms │                   534.88 ms │     no change │
│ QQuery 12 │  1350.85 ms │                  1207.47 ms │ +1.12x faster │
│ QQuery 13 │  2004.07 ms │                  2106.37 ms │  1.05x slower │
│ QQuery 14 │  1374.00 ms │                  1446.77 ms │  1.05x slower │
│ QQuery 15 │  1136.84 ms │                  1212.10 ms │  1.07x slower │
│ QQuery 16 │  2419.69 ms │                  2537.11 ms │     no change │
│ QQuery 17 │  2438.26 ms │                  2476.87 ms │     no change │
│ QQuery 18 │  5719.70 ms │                  4853.94 ms │ +1.18x faster │
│ QQuery 19 │   136.77 ms │                   117.56 ms │ +1.16x faster │
│ QQuery 20 │  1914.72 ms │                        FAIL │  incomparable │
│ QQuery 21 │  2325.91 ms │                  2312.45 ms │     no change │
│ QQuery 22 │  4437.71 ms │                  4024.68 ms │ +1.10x faster │
│ QQuery 23 │  1115.86 ms │                  1082.78 ms │     no change │
│ QQuery 24 │   248.45 ms │                   261.47 ms │  1.05x slower │
│ QQuery 25 │   636.00 ms │                   460.87 ms │ +1.38x faster │
│ QQuery 26 │   339.32 ms │                   327.46 ms │     no change │
│ QQuery 27 │  2963.69 ms │                  2916.43 ms │     no change │
│ QQuery 28 │ 24583.44 ms │                 24251.27 ms │     no change │
│ QQuery 29 │   948.05 ms │                   985.53 ms │     no change │
│ QQuery 30 │  1285.34 ms │                  1320.57 ms │     no change │
│ QQuery 31 │  1361.41 ms │                  1350.29 ms │     no change │
│ QQuery 32 │  4580.95 ms │                  4435.30 ms │     no change │
│ QQuery 33 │  5455.86 ms │                  5314.54 ms │     no change │
│ QQuery 34 │  5923.22 ms │                  5942.30 ms │     no change │
│ QQuery 35 │  1911.61 ms │                  1862.52 ms │     no change │
│ QQuery 36 │   186.85 ms │                   161.22 ms │ +1.16x faster │
│ QQuery 37 │    86.13 ms │                    73.11 ms │ +1.18x faster │
│ QQuery 38 │    87.44 ms │                    87.81 ms │     no change │
│ QQuery 39 │   287.32 ms │                   286.57 ms │     no change │
│ QQuery 40 │    56.38 ms │                    59.68 ms │  1.06x slower │
│ QQuery 41 │    50.58 ms │                    50.25 ms │     no change │
│ QQuery 42 │    36.78 ms │                    37.75 ms │     no change │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 82183.22ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 80410.02ms │
│ Average Time (HEAD)                        │  2004.47ms │
│ Average Time (fix-parquet-filter-pushdown) │  1961.22ms │
│ Queries Faster                             │          9 │
│ Queries Slower                             │          5 │
│ Queries with No Change                     │         27 │
│ Queries with Failure                       │          2 │
└────────────────────────────────────────────┴────────────┘

darmie · 2026-02-26T08:12:48Z

@Dandandan The Q1 and Q20 problem was that: when a query like SELECT COUNT(*) WHERE AdvEngineID <> 0 runs, the output projection is empty, the per-conjunct logic could not detect extra projected columns and skipped the decoding and demoted to batch filter. And batch filter runs after decoding against the output schema (which has no columns) - this caused a crash.

I just pushed a fix: If the filter references columns that aren't in the output projection, it must stay in the RowFilter.

Let's run the bench again and see

Dandandan · 2026-02-27T11:14:39Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-27T11:14:44Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (fd21f30) to 468b690 diff using: clickbench_partitioned
Results will be posted here when complete

Dandandan · 2026-02-27T11:16:35Z

                let has_extra_cols = projection_col_indices
                    .iter()
                    .any(|idx| !conjunct_cols.contains(idx));
+                // 2. The conjunct references columns NOT in the output


Hmm but won't it be better than to fix the reader schema to add the demoted columns?

Copilot

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

alamb-ghbot · 2026-02-27T11:44:39Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.57 ms │                     2.64 ms │     no change │
│ QQuery 1  │    51.65 ms │                    53.28 ms │     no change │
│ QQuery 2  │   131.58 ms │                   135.92 ms │     no change │
│ QQuery 3  │   176.08 ms │                   157.45 ms │ +1.12x faster │
│ QQuery 4  │  1144.94 ms │                  1071.41 ms │ +1.07x faster │
│ QQuery 5  │  1348.49 ms │                  1344.75 ms │     no change │
│ QQuery 6  │    18.48 ms │                    15.48 ms │ +1.19x faster │
│ QQuery 7  │    67.13 ms │                    54.41 ms │ +1.23x faster │
│ QQuery 8  │  1498.01 ms │                  1479.02 ms │     no change │
│ QQuery 9  │  1887.88 ms │                  1759.03 ms │ +1.07x faster │
│ QQuery 10 │   477.26 ms │                   484.02 ms │     no change │
│ QQuery 11 │   527.02 ms │                   532.77 ms │     no change │
│ QQuery 12 │  1505.31 ms │                  1238.86 ms │ +1.22x faster │
│ QQuery 13 │  2134.92 ms │                  2121.60 ms │     no change │
│ QQuery 14 │  1509.09 ms │                  1467.51 ms │     no change │
│ QQuery 15 │  1279.68 ms │                  1240.35 ms │     no change │
│ QQuery 16 │  2537.63 ms │                  2545.28 ms │     no change │
│ QQuery 17 │  2534.84 ms │                  2518.65 ms │     no change │
│ QQuery 18 │  5589.90 ms │                  4840.34 ms │ +1.15x faster │
│ QQuery 19 │   141.21 ms │                   121.29 ms │ +1.16x faster │
│ QQuery 20 │  1951.07 ms │                  1893.81 ms │     no change │
│ QQuery 21 │  2413.85 ms │                  2371.00 ms │     no change │
│ QQuery 22 │  5566.15 ms │                  3960.59 ms │ +1.41x faster │
│ QQuery 23 │  1143.92 ms │                  1108.28 ms │     no change │
│ QQuery 24 │   251.68 ms │                   262.26 ms │     no change │
│ QQuery 25 │   653.17 ms │                   463.36 ms │ +1.41x faster │
│ QQuery 26 │   355.10 ms │                   337.67 ms │     no change │
│ QQuery 27 │  3133.63 ms │                  2907.39 ms │ +1.08x faster │
│ QQuery 28 │ 24525.10 ms │                 24620.78 ms │     no change │
│ QQuery 29 │   968.43 ms │                   971.24 ms │     no change │
│ QQuery 30 │  1329.70 ms │                  1354.04 ms │     no change │
│ QQuery 31 │  1351.53 ms │                  1358.42 ms │     no change │
│ QQuery 32 │  4471.28 ms │                  3937.16 ms │ +1.14x faster │
│ QQuery 33 │  5450.41 ms │                  5124.54 ms │ +1.06x faster │
│ QQuery 34 │  5695.23 ms │                  5896.60 ms │     no change │
│ QQuery 35 │  2021.74 ms │                  1919.12 ms │ +1.05x faster │
│ QQuery 36 │   187.15 ms │                   163.48 ms │ +1.14x faster │
│ QQuery 37 │    92.65 ms │                    73.76 ms │ +1.26x faster │
│ QQuery 38 │    93.31 ms │                    93.59 ms │     no change │
│ QQuery 39 │   298.58 ms │                   289.96 ms │     no change │
│ QQuery 40 │    59.76 ms │                    60.30 ms │     no change │
│ QQuery 41 │    50.80 ms │                    50.79 ms │     no change │
│ QQuery 42 │    37.42 ms │                    36.90 ms │     no change │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 86665.31ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 82439.09ms │
│ Average Time (HEAD)                        │  2015.47ms │
│ Average Time (fix-parquet-filter-pushdown) │  1917.19ms │
│ Queries Faster                             │         16 │
│ Queries Slower                             │          0 │
│ Queries with No Change                     │         27 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Dandandan · 2026-02-27T12:08:38Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.57 ms │                     2.64 ms │     no change │
│ QQuery 1  │    51.65 ms │                    53.28 ms │     no change │
│ QQuery 2  │   131.58 ms │                   135.92 ms │     no change │
│ QQuery 3  │   176.08 ms │                   157.45 ms │ +1.12x faster │
│ QQuery 4  │  1144.94 ms │                  1071.41 ms │ +1.07x faster │
│ QQuery 5  │  1348.49 ms │                  1344.75 ms │     no change │
│ QQuery 6  │    18.48 ms │                    15.48 ms │ +1.19x faster │
│ QQuery 7  │    67.13 ms │                    54.41 ms │ +1.23x faster │
│ QQuery 8  │  1498.01 ms │                  1479.02 ms │     no change │
│ QQuery 9  │  1887.88 ms │                  1759.03 ms │ +1.07x faster │
│ QQuery 10 │   477.26 ms │                   484.02 ms │     no change │
│ QQuery 11 │   527.02 ms │                   532.77 ms │     no change │
│ QQuery 12 │  1505.31 ms │                  1238.86 ms │ +1.22x faster │
│ QQuery 13 │  2134.92 ms │                  2121.60 ms │     no change │
│ QQuery 14 │  1509.09 ms │                  1467.51 ms │     no change │
│ QQuery 15 │  1279.68 ms │                  1240.35 ms │     no change │
│ QQuery 16 │  2537.63 ms │                  2545.28 ms │     no change │
│ QQuery 17 │  2534.84 ms │                  2518.65 ms │     no change │
│ QQuery 18 │  5589.90 ms │                  4840.34 ms │ +1.15x faster │
│ QQuery 19 │   141.21 ms │                   121.29 ms │ +1.16x faster │
│ QQuery 20 │  1951.07 ms │                  1893.81 ms │     no change │
│ QQuery 21 │  2413.85 ms │                  2371.00 ms │     no change │
│ QQuery 22 │  5566.15 ms │                  3960.59 ms │ +1.41x faster │
│ QQuery 23 │  1143.92 ms │                  1108.28 ms │     no change │
│ QQuery 24 │   251.68 ms │                   262.26 ms │     no change │
│ QQuery 25 │   653.17 ms │                   463.36 ms │ +1.41x faster │
│ QQuery 26 │   355.10 ms │                   337.67 ms │     no change │
│ QQuery 27 │  3133.63 ms │                  2907.39 ms │ +1.08x faster │
│ QQuery 28 │ 24525.10 ms │                 24620.78 ms │     no change │
│ QQuery 29 │   968.43 ms │                   971.24 ms │     no change │
│ QQuery 30 │  1329.70 ms │                  1354.04 ms │     no change │
│ QQuery 31 │  1351.53 ms │                  1358.42 ms │     no change │
│ QQuery 32 │  4471.28 ms │                  3937.16 ms │ +1.14x faster │
│ QQuery 33 │  5450.41 ms │                  5124.54 ms │ +1.06x faster │
│ QQuery 34 │  5695.23 ms │                  5896.60 ms │     no change │
│ QQuery 35 │  2021.74 ms │                  1919.12 ms │ +1.05x faster │
│ QQuery 36 │   187.15 ms │                   163.48 ms │ +1.14x faster │
│ QQuery 37 │    92.65 ms │                    73.76 ms │ +1.26x faster │
│ QQuery 38 │    93.31 ms │                    93.59 ms │     no change │
│ QQuery 39 │   298.58 ms │                   289.96 ms │     no change │
│ QQuery 40 │    59.76 ms │                    60.30 ms │     no change │
│ QQuery 41 │    50.80 ms │                    50.79 ms │     no change │
│ QQuery 42 │    37.42 ms │                    36.90 ms │     no change │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 86665.31ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 82439.09ms │
│ Average Time (HEAD)                        │  2015.47ms │
│ Average Time (fix-parquet-filter-pushdown) │  1917.19ms │
│ Queries Faster                             │         16 │
│ Queries Slower                             │          0 │
│ Queries with No Change                     │         27 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Seems quite beneficial, no regressions (I'll kick it off once more)

Dandandan · 2026-02-27T12:08:56Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-27T12:09:05Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (fd21f30) to 468b690 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-27T12:23:30Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.66 ms │                     2.59 ms │     no change │
│ QQuery 1  │    52.76 ms │                    52.12 ms │     no change │
│ QQuery 2  │   133.90 ms │                   135.60 ms │     no change │
│ QQuery 3  │   156.05 ms │                   156.94 ms │     no change │
│ QQuery 4  │  1067.72 ms │                  1092.53 ms │     no change │
│ QQuery 5  │  1312.58 ms │                  1325.79 ms │     no change │
│ QQuery 6  │    16.39 ms │                    18.03 ms │  1.10x slower │
│ QQuery 7  │    66.93 ms │                    53.31 ms │ +1.26x faster │
│ QQuery 8  │  1395.41 ms │                  1424.13 ms │     no change │
│ QQuery 9  │  1791.02 ms │                  1786.01 ms │     no change │
│ QQuery 10 │   486.61 ms │                   484.07 ms │     no change │
│ QQuery 11 │   539.59 ms │                   536.52 ms │     no change │
│ QQuery 12 │  1396.00 ms │                  1226.85 ms │ +1.14x faster │
│ QQuery 13 │  2111.94 ms │                  2096.27 ms │     no change │
│ QQuery 14 │  1434.97 ms │                  1490.21 ms │     no change │
│ QQuery 15 │  1174.34 ms │                  1222.33 ms │     no change │
│ QQuery 16 │  2471.36 ms │                  2530.28 ms │     no change │
│ QQuery 17 │  2453.20 ms │                  2519.88 ms │     no change │
│ QQuery 18 │  4719.31 ms │                  4783.06 ms │     no change │
│ QQuery 19 │   140.42 ms │                   121.56 ms │ +1.16x faster │
│ QQuery 20 │  1862.56 ms │                  1888.83 ms │     no change │
│ QQuery 21 │  2301.22 ms │                  2367.31 ms │     no change │
│ QQuery 22 │  3931.27 ms │                  4016.95 ms │     no change │
│ QQuery 23 │  1075.30 ms │                  1110.72 ms │     no change │
│ QQuery 24 │   257.23 ms │                   247.26 ms │     no change │
│ QQuery 25 │   640.98 ms │                   459.25 ms │ +1.40x faster │
│ QQuery 26 │   346.51 ms │                   317.45 ms │ +1.09x faster │
│ QQuery 27 │  2973.88 ms │                  2955.62 ms │     no change │
│ QQuery 28 │ 24020.82 ms │                 24311.13 ms │     no change │
│ QQuery 29 │   962.72 ms │                   988.72 ms │     no change │
│ QQuery 30 │  1283.18 ms │                  1312.11 ms │     no change │
│ QQuery 31 │  1306.59 ms │                  1346.07 ms │     no change │
│ QQuery 32 │  3938.31 ms │                  4083.82 ms │     no change │
│ QQuery 33 │  5128.29 ms │                  5096.89 ms │     no change │
│ QQuery 34 │  5517.29 ms │                  5372.92 ms │     no change │
│ QQuery 35 │  1847.61 ms │                  1885.91 ms │     no change │
│ QQuery 36 │   186.49 ms │                   162.51 ms │ +1.15x faster │
│ QQuery 37 │    86.52 ms │                    74.10 ms │ +1.17x faster │
│ QQuery 38 │    92.49 ms │                    90.55 ms │     no change │
│ QQuery 39 │   294.13 ms │                   297.21 ms │     no change │
│ QQuery 40 │    58.74 ms │                    57.96 ms │     no change │
│ QQuery 41 │    49.55 ms │                    51.48 ms │     no change │
│ QQuery 42 │    37.70 ms │                    36.27 ms │     no change │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 81122.54ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 81589.13ms │
│ Average Time (HEAD)                        │  1886.57ms │
│ Average Time (fix-parquet-filter-pushdown) │  1897.42ms │
│ Queries Faster                             │          7 │
│ Queries Slower                             │          1 │
│ Queries with No Change                     │         35 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

sdf-jkl · 2026-02-27T16:02:51Z

Would this still be beneficial for highly selective predicates in LM?

Even if all predicate columns are in projection, would we not win I/O by incrementally reducing data fetch per each RowFilter evaluation?

darmie · 2026-02-27T17:44:44Z

Would this still be beneficial for highly selective predicates in LM?

Even if all predicate columns are in projection, would we not win I/O by incrementally reducing data fetch per each RowFilter evaluation?

For the demoted case (single conjunct, filter cols = projection cols), RowFilter has to decode the same columns to evaluate the filter, there are no extra columns whose decode it can skip for non-matching rows. The savings are zero but the RowFilter machinery still adds overhead, so batch filter wins.

For multi-conjunct predicates on different columns (e.g. a = 1 AND b = 2, projection = [a, b]), each conjunct does have extra columns it can skip, so our per-conjunct logic keeps both in RowFilter, that's where incremental evaluation helps.

The one scenario where a single conjunct could still benefit despite covering all projection columns is page-index pruning , which is skipping entire pages. That's captured by can_use_index in the code (currently always false, but future-proofed).

Dandandan · 2026-02-27T17:46:40Z

run benchmark clickbench_partitioned
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS=true

alamb-ghbot · 2026-02-27T17:46:48Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-parquet-filter-pushdown (fd21f30) to 468b690 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2026-02-27T18:16:37Z

🤖: Benchmark completed

Details

Comparing HEAD and fix-parquet-filter-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ fix-parquet-filter-pushdown ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     2.57 ms │                     2.65 ms │     no change │
│ QQuery 1  │    52.23 ms │                    52.58 ms │     no change │
│ QQuery 2  │   134.72 ms │                   136.63 ms │     no change │
│ QQuery 3  │   153.87 ms │                   156.63 ms │     no change │
│ QQuery 4  │  1027.02 ms │                  1052.34 ms │     no change │
│ QQuery 5  │  1249.72 ms │                  1299.92 ms │     no change │
│ QQuery 6  │    17.97 ms │                    17.62 ms │     no change │
│ QQuery 7  │    66.93 ms │                    54.05 ms │ +1.24x faster │
│ QQuery 8  │  1407.75 ms │                  1419.16 ms │     no change │
│ QQuery 9  │  1790.07 ms │                  1738.42 ms │     no change │
│ QQuery 10 │   486.32 ms │                   495.42 ms │     no change │
│ QQuery 11 │   530.54 ms │                   522.88 ms │     no change │
│ QQuery 12 │  1394.11 ms │                  1200.47 ms │ +1.16x faster │
│ QQuery 13 │  2048.52 ms │                  2118.01 ms │     no change │
│ QQuery 14 │  1397.56 ms │                  1441.52 ms │     no change │
│ QQuery 15 │  1173.61 ms │                  1202.08 ms │     no change │
│ QQuery 16 │  2472.06 ms │                  2522.48 ms │     no change │
│ QQuery 17 │  2442.12 ms │                  2511.35 ms │     no change │
│ QQuery 18 │  5558.64 ms │                  5077.40 ms │ +1.09x faster │
│ QQuery 19 │   135.86 ms │                   120.44 ms │ +1.13x faster │
│ QQuery 20 │  1862.47 ms │                  1881.33 ms │     no change │
│ QQuery 21 │  2337.94 ms │                  2706.60 ms │  1.16x slower │
│ QQuery 22 │  4005.74 ms │                  4216.36 ms │  1.05x slower │
│ QQuery 23 │  1092.97 ms │                  1085.85 ms │     no change │
│ QQuery 24 │   252.92 ms │                   255.65 ms │     no change │
│ QQuery 25 │   645.40 ms │                   451.49 ms │ +1.43x faster │
│ QQuery 26 │   346.63 ms │                   323.72 ms │ +1.07x faster │
│ QQuery 27 │  2979.51 ms │                  2911.54 ms │     no change │
│ QQuery 28 │ 24236.67 ms │                 24267.69 ms │     no change │
│ QQuery 29 │   968.04 ms │                   965.75 ms │     no change │
│ QQuery 30 │  1290.82 ms │                  1283.41 ms │     no change │
│ QQuery 31 │  1314.35 ms │                  1337.37 ms │     no change │
│ QQuery 32 │  5031.39 ms │                  4040.43 ms │ +1.25x faster │
│ QQuery 33 │  5596.05 ms │                  5254.55 ms │ +1.06x faster │
│ QQuery 34 │  6179.41 ms │                  5687.48 ms │ +1.09x faster │
│ QQuery 35 │  1912.07 ms │                  1861.82 ms │     no change │
│ QQuery 36 │   184.05 ms │                   160.30 ms │ +1.15x faster │
│ QQuery 37 │    86.53 ms │                    73.30 ms │ +1.18x faster │
│ QQuery 38 │    90.05 ms │                    89.53 ms │     no change │
│ QQuery 39 │   300.45 ms │                   283.49 ms │ +1.06x faster │
│ QQuery 40 │    58.32 ms │                    58.21 ms │     no change │
│ QQuery 41 │    50.60 ms │                    49.26 ms │     no change │
│ QQuery 42 │    36.26 ms │                    36.62 ms │     no change │
└───────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 84400.87ms │
│ Total Time (fix-parquet-filter-pushdown)   │ 82423.81ms │
│ Average Time (HEAD)                        │  1962.81ms │
│ Average Time (fix-parquet-filter-pushdown) │  1916.83ms │
│ Queries Faster                             │         12 │
│ Queries Slower                             │          2 │
│ Queries with No Change                     │         29 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

sdf-jkl · 2026-02-27T18:58:21Z

Nice, thanks for the explanation @darmie!

alamb · 2026-03-09T12:54:50Z

To be honest, these results are pretty compelling:
#20417 (comment)
#20417 (comment)

However, I am surprised that Q23 doesn't get substntially faster (SELECT * ...)

in this case the dynamic filter from the TopK should be pushed down

alamb

Thanks @darmie -- this looks pretty neat. I don't understand the Q29 clickbench results, but otherwise the idea shows promise

alamb · 2026-03-09T12:56:50Z

+            .with_pushdown_filters(true)
+            .with_reorder_filters(true)
+            .build();
+        let stream = opener.open(file.clone()).unwrap().await.unwrap();


I feel like this test should also be asserting something about the predicates more directly (this is asserting the number of rows that comes out, rather than the fact that filter is pushed down)

alamb · 2026-03-09T12:58:16Z

-            // Filter pushdown: evaluate predicates during scan
-            if let Some(predicate) = pushdown_filters.then_some(predicate).flatten() {
-                let row_filter = row_filter::build_row_filter(
+            // Filter pushdown: evaluate predicates during scan.


If we are deciding what filters to push down based on projection and filter columns, is the ParquetOpener the right place? I wonder if we should move the determiniation earlier (like maybe don't bother to try and push down filters at all ?)

## Which issue does this PR close?  N/A ## Rationale for this change  Some PRs are being omitted from stale check because they were in a cache, and the workflow appears to not have permission to delete cache so they are forever stuck as unprocessed. For example in this run: https://github.com/apache/datafusion/actions/runs/24756695077/job/72431314533 Seeing this in logs: ``` [apache#20473] issue skipped due being processed during the previous run [apache#20460] pull request skipped due being processed during the previous run [apache#20448] issue skipped due being processed during the previous run [apache#20443] issue skipped due being processed during the previous run [apache#20435] issue skipped due being processed during the previous run [apache#20418] issue skipped due being processed during the previous run [apache#20417] pull request skipped due being processed during the previous run [apache#20416] pull request skipped due being processed during the previous run [apache#20403] pull request skipped due being processed during the previous run ``` And at the end we see this warning: ``` Warning: Error delete _state: [403] Resource not accessible by integration - https://docs.github.com/rest/actions/cache#delete-github-actions-caches-for-a-repository-using-a-cache-key ``` stale workflow uses a cache in case it hits the `operations-per-run` limit meant to prevent API rate limiting (we have default of 30), so it seems we previously hit this limit and some issues/PRs were cached, and have never been uncached since so are never processed again. See: https://github.com/actions/stale#operations-per-run ## What changes are included in this PR?  Give permission to stale workflow to run github actions (like delete cache). See recommended permissions: https://github.com/actions/stale#recommended-permissions ## Are these changes tested?  ## Are there any user-facing changes?

darmie added 2 commits February 17, 2026 21:52

github-actions Bot added the datasource Changes to the datasource crate label Feb 17, 2026

darmie added 2 commits February 17, 2026 22:31

Merge branch 'main' into fix-parquet-filter-pushdown

d7ff890

darmie mentioned this pull request Feb 18, 2026

[EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) #20324

Open

2 tasks

notashes mentioned this pull request Feb 18, 2026

perf: defer expensive string predicates from RowFilter when dynamic filter is present #20413

Closed

Dandandan requested a review from Copilot February 19, 2026 08:29

Copilot started reviewing on behalf of Dandandan February 19, 2026 08:30 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

darmie requested a review from Dandandan February 26, 2026 17:34

Dandandan reviewed Feb 27, 2026

View reviewed changes

Dandandan requested a review from Copilot February 27, 2026 11:17

Copilot started reviewing on behalf of Dandandan February 27, 2026 11:17 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Dandandan mentioned this pull request Feb 27, 2026

[EPIC] Make DataFusion the top of the ClickBench Parquet leaderboard #18489

Open

16 tasks

alamb reviewed Mar 9, 2026

View reviewed changes

Conversation

darmie commented Feb 17, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 18, 2026

Uh oh!

darmie commented Feb 18, 2026

Uh oh!

alamb-ghbot commented Feb 18, 2026

Uh oh!

Dandandan commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Dandandan commented Feb 19, 2026

Uh oh!

Dandandan commented Feb 19, 2026

Uh oh!

alamb-ghbot commented Feb 26, 2026

Uh oh!

Dandandan commented Feb 26, 2026

Uh oh!

alamb-ghbot commented Feb 26, 2026

Uh oh!

Dandandan commented Feb 26, 2026

Uh oh!

darmie commented Feb 26, 2026

Uh oh!

alamb-ghbot commented Feb 26, 2026

Uh oh!

darmie commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented Feb 27, 2026

Uh oh!

alamb-ghbot commented Feb 27, 2026

Uh oh!

Dandandan Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

alamb-ghbot commented Feb 27, 2026

Uh oh!

Dandandan commented Feb 19, 2026 •

edited

Loading

darmie commented Feb 26, 2026 •

edited

Loading