Skip to content

CometSparkToColumnarExec missing from AQE plan on Spark 4 (two skipped tests) #4031

@andygrove

Description

@andygrove

Summary

Two tests in CometExecSuite are currently skipped with assume(!isSpark40Plus):

  • SparkToColumnar eliminate redundant in AQE
  • SparkToColumnar override node name for row input

When the guards are removed and the tests run on Spark 4, both fail with List() had length 0 instead of expected length 1 — i.e. the tests look for exactly one CometSparkToColumnarExec in the final AQE plan and find zero.

Reproducer

Both tests run the same query shape:

withSQLConf(
    SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
    CometConf.COMET_SHUFFLE_MODE.key -> "jvm") {
  val df = spark
    .range(1000)
    .selectExpr(\"id as key\", \"id % 8 as value\")
    .toDF(\"key\", \"value\")
    .groupBy(\"key\")
    .count()
  df.collect()

  val planAfter = df.queryExecution.executedPlan
  val adaptivePlan = planAfter.asInstanceOf[AdaptiveSparkPlanExec].executedPlan
  val found = adaptivePlan.collect { case c: CometSparkToColumnarExec => c }
  assert(found.length == 1)
}

On Spark 3.5 this produces exactly one CometSparkToColumnarExec; on Spark 4 it produces zero.

Likely root cause

RangeExec behavior or the AQE insertion rule differs between Spark 3.5 and 4 such that no CometSparkToColumnarExec is inserted in the final plan. Needs investigation to determine whether:

  1. The CometSparkToColumnarExec insertion rule should be adjusted for Spark 4, or
  2. Spark 4's RangeExec already produces columnar output and the test assertions are stale, or
  3. AQE is eliminating the wrapper in a new way.

How to reproduce

On branch with both assume(!isSpark40Plus) lines commented out in CometExecSuite.scala:2392 and :2482:

./mvnw test -Pspark-4.0 -Dtest=none -Dsuites='org.apache.comet.exec.CometExecSuite SparkToColumnar'

Both affected tests fail; other SparkToColumnar tests pass.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions