Summary
With native_datafusion, a scalar subquery pushed down as a data filter on CometNativeScanExec does not produce a ReusedSubqueryExec the way Spark's vectorized reader (and CometScanExec) do. The pushed subquery is a plain Subquery, so subsequent references to the same subquery do not share the result.
Failing Test
SubquerySuite: "SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery"
Reproduction
Updating the test's plan-match to include CometNativeScanExec:
val dataSourceScanExec = collect(df.queryExecution.executedPlan) {
case f: FileSourceScanLike => f
case c: CometScanExec => c
case n: CometNativeScanExec => n
}
makes the first assertion (dataSourceScanExec.size == 1) pass. The next assertion still fails:
was not instance of org.apache.spark.sql.execution.ReusedSubqueryExec (SubquerySuite.scala:2716)
with the plan showing a plain Subquery rather than ReusedSubqueryExec:
Subquery subquery#295, [id=#166]
+- AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
ResultQueryStage 2
+- CometNativeColumnarToRow
+- CometHashAggregate [min#303], Final, [min(c2#297)]
+- ShuffleQueryStage 0
+- CometExchange SinglePartition, ...
+- CometHashAggregate [c2#297], Partial, [partial_min(c2#297)]
+- CometNativeScan parquet ...
The dataFilters on the CometNativeScanExec carry the subquery reference but aren't wired into the reused-subquery machinery.
Related
Split from #3315 while triaging the tests previously ignored under #3321.
Summary
With
native_datafusion, a scalar subquery pushed down as a data filter onCometNativeScanExecdoes not produce aReusedSubqueryExecthe way Spark's vectorized reader (andCometScanExec) do. The pushed subquery is a plainSubquery, so subsequent references to the same subquery do not share the result.Failing Test
SubquerySuite: "SPARK-43402: FileSourceScanExec supports push down data filter with scalar subquery"Reproduction
Updating the test's plan-match to include
CometNativeScanExec:makes the first assertion (
dataSourceScanExec.size == 1) pass. The next assertion still fails:with the plan showing a plain
Subqueryrather thanReusedSubqueryExec:The
dataFilterson theCometNativeScanExeccarry the subquery reference but aren't wired into the reused-subquery machinery.Related
Split from #3315 while triaging the tests previously ignored under #3321.