Skip to content

[SPARK-55903][SQL] Simplify MERGE Schema Evolution and Check Write Privileges#54704

Closed
szehon-ho wants to merge 4 commits intoapache:masterfrom
szehon-ho:anton_refactor_merge_1
Closed

[SPARK-55903][SQL] Simplify MERGE Schema Evolution and Check Write Privileges#54704
szehon-ho wants to merge 4 commits intoapache:masterfrom
szehon-ho:anton_refactor_merge_1

Conversation

@szehon-ho
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho commented Mar 9, 2026

What changes were proposed in this pull request?

Some simplification for Merge Into Schema Evolution and minor bug fixes

  • The biggest cleanup is getting rid of 'needSchemaEvolution', 'canEvaluateSchemaEvolution', 'changesForSchemaEvolution'. The three-part state is because the rule to evaluate schema evolution needed the analyzer to resolve all the references it can (minus the un-resolved target references in the query that would be solved by schema evolution). Thus it needed a weird 'canEvaluateSchemaEvolution' state to block until that happened. Now, the code has a simple 'pendingChanges' that is empty in both cases 1) when the analyzer has not resolved enough references yet for the schema evolution code to proceed, and 2) we have enough information but decide we do not need schema evolution because there are no valid changes.
  • Load the table from catalog with write Privileges, previously it was not doing so and could be performing schema evolution without the privilege
  • Catch and wrap the exception properly

Why are the changes needed?

Discussed with @aokolnychyi on the state of Spark 4.1 schema evolution , and he suggested these changes as the code is currently confusing. Not using the write privileges is also wrong.

Does this PR introduce any user-facing change?

Write privilege is now enforced (if any system used DSV2 privileges). Error message are changed.

How was this patch tested?

Run existing unit tests

Was this patch authored or co-authored using generative AI tooling?

No

val writePrivileges = MergeIntoTable.getWritePrivileges(merge)
catalog.loadTable(ident, writePrivileges.toSet.asJava)
} catch {
case e: IllegalArgumentException if !e.isInstanceOf[SparkThrowable] =>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to keep in line with AlterTableExec exceptions:

case e: IllegalArgumentException if !e.isInstanceOf[SparkThrowable] =>
but let me know if it is unnecessary

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case m @ MergeIntoTable(aliasedTable, source, cond, matchedActions, notMatchedActions,
notMatchedBySourceActions, _) if m.resolved && m.rewritable && m.aligned &&
!m.needSchemaEvolution && matchedActions.isEmpty && notMatchedActions.size == 1 &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we talked about this. Just to confirm. Removing !m.needSchemaEvolution is safe as aligned would be false otherwise?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i also checked with AI here:

Yes. It’s safe even without assuming rule order.

Reasoning:

  • aligned is defined as: every action’s assignments align with the current targetTable.output (same length, matching attribute names, compatible types).
  • Pending schema evolution means the catalog target hasn’t been evolved yet, so in the plan targetTable.output is still the pre‑evolution schema.
  • Assignments that refer to new columns or evolved types therefore cannot align with that current target (e.g. different length or incompatible types), so aligned is false whenever there are pending schema changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming!

val writePrivileges = MergeIntoTable.getWritePrivileges(merge)
catalog.loadTable(ident, writePrivileges.toSet.asJava)
} catch {
case e: IllegalArgumentException if !e.isInstanceOf[SparkThrowable] =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

val keyPath = extractFieldPath(assignment.key, allowUnresolved = true)
// value should always be resolved (from source)
val valuePath = extractFieldPath(assignment.value, allowUnresolved = false)
keyPath == valuePath && assignment.value.references.subsetOf(source.outputSet)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check: assignment.value.references.subsetOf(source.outputSet) works for nested?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.. also from AI here is how it works:

MERGE ... UPDATE SET addr.city = source.addr.city

  • Key: nested path addr.city, e.g. GetStructField(GetStructField(target.addr, ...), ...)
  • Value: same path on source, e.g. GetStructField(GetStructField(source.addr, 0), 1)

this may be possible to improve, we'll have to explore in the next pr

@aokolnychyi aokolnychyi changed the title [SPARk-55903][SQL] Simplify Merge Into Schema Evolution and Check Write Privileges [SPARK-55903][SQL] Simplify Merge Into Schema Evolution and Check Write Privileges Mar 10, 2026
@aokolnychyi aokolnychyi changed the title [SPARK-55903][SQL] Simplify Merge Into Schema Evolution and Check Write Privileges [SPARK-55903][SQL] Simplify MERGE Schema Evolution and Check Write Privileges Mar 10, 2026
@aokolnychyi
Copy link
Copy Markdown
Contributor

Thanks, @szehon-ho! Merged to master.

johanl-db added a commit to johanl-db/spark that referenced this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants