Skip to content

native_datafusion: scalar subquery pushdown does not produce ReusedSubqueryExec #4042

@andygrove

Description

@andygrove

Summary

With native_datafusion, a scalar subquery pushed down as a data filter on CometNativeScanExec does not produce a ReusedSubqueryExec the way Spark's vectorized reader (and CometScanExec) do. The pushed subquery is a plain Subquery, so subsequent references to the same subquery do not share the result.

Failing Test

SubquerySuite: "SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery"

Reproduction

Updating the test's plan-match to include CometNativeScanExec:

val dataSourceScanExec = collect(df.queryExecution.executedPlan) {
  case f: FileSourceScanLike => f
  case c: CometScanExec => c
  case n: CometNativeScanExec => n
}

makes the first assertion (dataSourceScanExec.size == 1) pass. The next assertion still fails:

was not instance of org.apache.spark.sql.execution.ReusedSubqueryExec (SubquerySuite.scala:2716)

with the plan showing a plain Subquery rather than ReusedSubqueryExec:

Subquery subquery#295, [id=#166]
+- AdaptiveSparkPlan isFinalPlan=true
   +- == Final Plan ==
      ResultQueryStage 2
      +- CometNativeColumnarToRow
         +- CometHashAggregate [min#303], Final, [min(c2#297)]
            +- ShuffleQueryStage 0
               +- CometExchange SinglePartition, ...
                  +- CometHashAggregate [c2#297], Partial, [partial_min(c2#297)]
                     +- CometNativeScan parquet ...

The dataFilters on the CometNativeScanExec carry the subquery reference but aren't wired into the reused-subquery machinery.

Related

Split from #3315 while triaging the tests previously ignored under #3321.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingpriority:mediumFunctional bugs, performance regressions, broken featuresspark 4

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions