feat: plan IN/EXISTS subquery decorrelation as RightSemi by default#21809
feat: plan IN/EXISTS subquery decorrelation as RightSemi by default#21809Dandandan wants to merge 1 commit intoapache:mainfrom
Conversation
Change DecorrelatePredicateSubquery to emit RightSemi (with the subquery on the left and the outer query on the right) instead of LeftSemi for non-negated IN/EXISTS predicates. Semantics are preserved: RightSemi returns rows from the right input (the outer query) that have a match on the left, which matches the previous LeftSemi output. The NOT IN / NOT EXISTS path still uses LeftAnti (including null-aware handling) and is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark tpch tpch10 tpcds |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing right_semi_default (265fe55) to 067ba4b (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing right_semi_default (265fe55) to 067ba4b (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing right_semi_default (265fe55) to 067ba4b (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
Lets keep it for now as is |
Which issue does this PR close?
Rationale for this change
Are these changes tested?
Yes — existing tests are updated to the new plan shape:
cargo test -p datafusion-optimizer(unit + integration snapshots regenerated viacargo insta)cargo test --test sqllogictests -p datafusion-sqllogictest(all 463 SLT files)INCLUDE_TPCH=true cargo test --test sqllogictests -p datafusion-sqllogictest(all 464 SLT files, including TPC-H plan snapshots)cargo fmt --all/cargo clippy -p datafusion-optimizer --all-targets -- -D warningsAre there any user-facing changes?
The optimized logical/physical plan for
IN/EXISTSsubqueries now showsRightSemiwith the subquery as the left child. Query results are unchanged.