Summary
Two tests in CometExecSuite are currently skipped with assume(!isSpark40Plus):
SparkToColumnar eliminate redundant in AQE
SparkToColumnar override node name for row input
When the guards are removed and the tests run on Spark 4, both fail with List() had length 0 instead of expected length 1 — i.e. the tests look for exactly one CometSparkToColumnarExec in the final AQE plan and find zero.
Reproducer
Both tests run the same query shape:
withSQLConf(
SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
CometConf.COMET_SHUFFLE_MODE.key -> "jvm") {
val df = spark
.range(1000)
.selectExpr(\"id as key\", \"id % 8 as value\")
.toDF(\"key\", \"value\")
.groupBy(\"key\")
.count()
df.collect()
val planAfter = df.queryExecution.executedPlan
val adaptivePlan = planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
val found = adaptivePlan.collect { case c: CometSparkToColumnarExec => c }
assert(found.length == 1)
}
On Spark 3.5 this produces exactly one CometSparkToColumnarExec; on Spark 4 it produces zero.
Likely root cause
RangeExec behavior or the AQE insertion rule differs between Spark 3.5 and 4 such that no CometSparkToColumnarExec is inserted in the final plan. Needs investigation to determine whether:
- The
CometSparkToColumnarExec insertion rule should be adjusted for Spark 4, or
- Spark 4's
RangeExec already produces columnar output and the test assertions are stale, or
- AQE is eliminating the wrapper in a new way.
How to reproduce
On branch with both assume(!isSpark40Plus) lines commented out in CometExecSuite.scala:2392 and :2482:
./mvnw test -Pspark-4.0 -Dtest=none -Dsuites='org.apache.comet.exec.CometExecSuite SparkToColumnar'
Both affected tests fail; other SparkToColumnar tests pass.
Summary
Two tests in
CometExecSuiteare currently skipped withassume(!isSpark40Plus):SparkToColumnar eliminate redundant in AQESparkToColumnar override node name for row inputWhen the guards are removed and the tests run on Spark 4, both fail with
List() had length 0 instead of expected length 1— i.e. the tests look for exactly oneCometSparkToColumnarExecin the final AQE plan and find zero.Reproducer
Both tests run the same query shape:
withSQLConf( SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", CometConf.COMET_SHUFFLE_MODE.key -> "jvm") { val df = spark .range(1000) .selectExpr(\"id as key\", \"id % 8 as value\") .toDF(\"key\", \"value\") .groupBy(\"key\") .count() df.collect() val planAfter = df.queryExecution.executedPlan val adaptivePlan = planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan val found = adaptivePlan.collect { case c: CometSparkToColumnarExec => c } assert(found.length == 1) }On Spark 3.5 this produces exactly one
CometSparkToColumnarExec; on Spark 4 it produces zero.Likely root cause
RangeExecbehavior or the AQE insertion rule differs between Spark 3.5 and 4 such that noCometSparkToColumnarExecis inserted in the final plan. Needs investigation to determine whether:CometSparkToColumnarExecinsertion rule should be adjusted for Spark 4, orRangeExecalready produces columnar output and the test assertions are stale, orHow to reproduce
On branch with both
assume(!isSpark40Plus)lines commented out inCometExecSuite.scala:2392and:2482:Both affected tests fail; other
SparkToColumnartests pass.