[SPARK-55903][SQL] Simplify MERGE Schema Evolution and Check Write Privileges by szehon-ho · Pull Request #54704 · apache/spark

szehon-ho · 2026-03-09T23:41:47Z

What changes were proposed in this pull request?

Some simplification for Merge Into Schema Evolution and minor bug fixes

The biggest cleanup is getting rid of 'needSchemaEvolution', 'canEvaluateSchemaEvolution', 'changesForSchemaEvolution'. The three-part state is because the rule to evaluate schema evolution needed the analyzer to resolve all the references it can (minus the un-resolved target references in the query that would be solved by schema evolution). Thus it needed a weird 'canEvaluateSchemaEvolution' state to block until that happened. Now, the code has a simple 'pendingChanges' that is empty in both cases 1) when the analyzer has not resolved enough references yet for the schema evolution code to proceed, and 2) we have enough information but decide we do not need schema evolution because there are no valid changes.
Load the table from catalog with write Privileges, previously it was not doing so and could be performing schema evolution without the privilege
Catch and wrap the exception properly

Why are the changes needed?

Discussed with @aokolnychyi on the state of Spark 4.1 schema evolution , and he suggested these changes as the code is currently confusing. Not using the write privileges is also wrong.

Does this PR introduce any user-facing change?

Write privilege is now enforced (if any system used DSV2 privileges). Error message are changed.

How was this patch tested?

Run existing unit tests

Was this patch authored or co-authored using generative AI tooling?

No

…te Privileges

szehon-ho · 2026-03-09T23:46:28Z

.../src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveMergeIntoSchemaEvolution.scala

+      val writePrivileges = MergeIntoTable.getWritePrivileges(merge)
+      catalog.loadTable(ident, writePrivileges.toSet.asJava)
+    } catch {
+      case e: IllegalArgumentException if !e.isInstanceOf[SparkThrowable] =>


this is to keep in line with AlterTableExec exceptions:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableExec.scala

Line 43 in adeb35d

case e: IllegalArgumentException if !e.isInstanceOf[SparkThrowable] =>

but let me know if it is unnecessary

Good point.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala

aokolnychyi · 2026-03-10T09:59:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala

  override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    case m @ MergeIntoTable(aliasedTable, source, cond, matchedActions, notMatchedActions,
-      notMatchedBySourceActions, _) if m.resolved && m.rewritable && m.aligned &&
-        !m.needSchemaEvolution && matchedActions.isEmpty && notMatchedActions.size == 1 &&


I know we talked about this. Just to confirm. Removing !m.needSchemaEvolution is safe as aligned would be false otherwise?

Yes, i also checked with AI here:

Yes. It’s safe even without assuming rule order.

Reasoning:

aligned is defined as: every action’s assignments align with the current targetTable.output (same length, matching attribute names, compatible types).

Pending schema evolution means the catalog target hasn’t been evolved yet, so in the plan targetTable.output is still the pre‑evolution schema.

Assignments that refer to new columns or evolved types therefore cannot align with that current target (e.g. different length or incompatible types), so aligned is false whenever there are pending schema changes.

Thanks for confirming!

aokolnychyi · 2026-03-10T10:13:19Z

.../src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveMergeIntoSchemaEvolution.scala

+      val writePrivileges = MergeIntoTable.getWritePrivileges(merge)
+      catalog.loadTable(ident, writePrivileges.toSet.asJava)
+    } catch {
+      case e: IllegalArgumentException if !e.isInstanceOf[SparkThrowable] =>


Good point.

.../src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveMergeIntoSchemaEvolution.scala

aokolnychyi · 2026-03-10T10:17:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

+    val keyPath = extractFieldPath(assignment.key, allowUnresolved = true)
+    // value should always be resolved (from source)
+    val valuePath = extractFieldPath(assignment.value, allowUnresolved = false)
+    keyPath == valuePath && assignment.value.references.subsetOf(source.outputSet)


Just to double check: assignment.value.references.subsetOf(source.outputSet) works for nested?

yes.. also from AI here is how it works:

MERGE ... UPDATE SET addr.city = source.addr.city

Key: nested path addr.city, e.g. GetStructField(GetStructField(target.addr, ...), ...)

Value: same path on source, e.g. GetStructField(GetStructField(source.addr, 0), 1)

this may be possible to improve, we'll have to explore in the next pr

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

aokolnychyi · 2026-03-11T13:33:35Z

Thanks, @szehon-ho! Merged to master.

[SPARk-55903][SQL] Simplify Merge Into Schema Evolution and Check Wri…

288dc1b

…te Privileges

szehon-ho commented Mar 9, 2026

View reviewed changes

aokolnychyi reviewed Mar 10, 2026

View reviewed changes

aokolnychyi mentioned this pull request Mar 10, 2026

[SPARK-55690] Schema evolution in DSv2 AppendData, OverwriteByExpression, OverwritePartitionsDynamic #54488

Closed

aokolnychyi changed the title ~~[SPARk-55903][SQL] Simplify Merge Into Schema Evolution and Check Write Privileges~~ [SPARK-55903][SQL] Simplify Merge Into Schema Evolution and Check Write Privileges Mar 10, 2026

aokolnychyi changed the title ~~[SPARK-55903][SQL] Simplify Merge Into Schema Evolution and Check Write Privileges~~ [SPARK-55903][SQL] Simplify MERGE Schema Evolution and Check Write Privileges Mar 10, 2026

szehon-ho added 2 commits March 10, 2026 12:06

Address review comments

04f648f

Fix Spark throwable suite

9594431

aokolnychyi reviewed Mar 11, 2026

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Show resolved Hide resolved

aokolnychyi reviewed Mar 11, 2026

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala Outdated Show resolved Hide resolved

aokolnychyi approved these changes Mar 11, 2026

View reviewed changes

Minor nits

7b02fc1

aokolnychyi closed this in 104e43b Mar 11, 2026

johanl-db added a commit to johanl-db/spark that referenced this pull request Mar 11, 2026

Resolve conflicts from apache#54704

0cc771f

Conversation

szehon-ho commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aokolnychyi commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho commented Mar 9, 2026 •

edited

Loading