[SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution by xupefei · Pull Request #45748 · apache/spark

xupefei · 2024-03-28T09:52:38Z

Why are the changes needed?

This PR introduces a syntax WITH SCHEMA EVOLUTION to the SQL MERGE command, which allows the user to specify automatic schema evolution for a specific operation.

MERGE WITH SCHEMA EVOLUTION
INTO tgt
USING src
ON ...
WHEN ...

When WITH SCHEMA EVOLUTION is specified, schema evolution-related features must be turned on for this single statement and only in this statement.

Spark is only responsible for recognizing the existence or absence of the syntax WITH SCHEMA EVOLUTION, and the result is passed down to the MERGE command. Data sources must respect the syntax and give appropriate reactions: turn on features that are categorised as "schema evolution" when the syntax does exist. For example, when the underlying table is Delta Lake, the feature "mergeSchema" will be turned on (see https://github.com/delta-io/delta/blob/c41977db3529a3139d6306abe5ded161f070982a/spark/src/main/scala/org/apache/spark/sql/delta/DeltaAnalysis.scala#L538).

Does this PR introduce any user-facing change?

Yes, see the previous section.

How was this patch tested?

New tests.

Was this patch authored or co-authored using generative AI tooling?

No.

n-young-db

Stamping, but you probably need a Spark committer to review this.

gengliangwang · 2024-04-16T17:53:05Z

@xupefei could you provide more details in the PR description? For example, what is the difference with/without WITH SCHEMA EVOLUTION

xupefei · 2024-04-16T20:48:12Z

@xupefei could you provide more details in the PR description? For example, what is the difference with/without WITH SCHEMA EVOLUTION

Hi @gengliangwang, I improved the PR description as you advised. Please have a look!

gengliangwang · 2024-04-17T17:52:43Z

Thanks, merging to master

dongjoon-hyun · 2024-04-17T18:05:56Z

cc @huaxingao , @RussellSpitzer

…Data Source ### What changes were proposed in this pull request? Add support for schema evolution for data source that support MERGE INTO, currently V2 DataSources. This means that if the SOURCE table of merge has a different schema than TARGET table, the TARGET table can automatically update to take into account the new or different fields. The basic idea is to add - TableCapability.MERGE_SCHEMA_EVOLUTION to indicate DSV2 table wants Spark to handle schema evolution for MERGE - ResolveMergeIntoSchemaEvolution rule, will generate DSV2 TableChanges and calls Catalog.alterTable For any new field in the top level or in a nested struct, Spark will add the field to the end. TODOS: 1. this currently does not support the case where SOURCE has a missing nested field from TARGET, and there is a UPDATE or INSERT star. Example: ``` MERGE INTO TARGET t USING SOURCE s // s=struct('a', struct('b': Int)) // t = struct('a', struct('c', int)) ``` will only work if the user specifies a value explicitly for the new nested field t.b for INSERT and UPDATE, ie ``` INSERT (s) VALUES (nested_struct('a', nested_struct('b', 1, 'c' 2))) UPDATE SET a.b = 2 ``` and not if they use INSERT * or UPDATE SET *. 2. Type widening is not allowed for the moment, as we need to decide what widenings to allow We can take this in a follow on pr. ### Why are the changes needed? #45748 added the syntax 'WITH SCHEMA EVOLUTION' to 'MERGE INTO'. However, this requires some external Spark extension to resolve Merge, and doesnt do anything in Spark's native MERGE implementation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added many tests to MergeIntoTableSuiteBase ### Was this patch authored or co-authored using generative AI tooling? No Closes #51698 from szehon-ho/merge_schema_evolution. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? Similar to the [MERGE WITH SCHEMA EVOLUTION PR](#45748), **this PR introduces a syntax `WITH SCHEMA EVOLUTION` to the SQL `INSERT` command.** Since this syntax is not fully implemented for any table formats yet, **users will receive an exception if they try to use it.** When `WITH SCHEMA EVOLUTION` is specified, schema evolution-related features must be turned on for this single statement and only in this statement. **In this PR, Spark is only responsible for recognizing the existence or absence of the syntax WITH SCHEMA EVOLUTION**, and the recognition info is passed down from the `Analyzer`. When `WITH SCHEMA EVOLUTION` is detected, Spark sets the `mergeSchema` write option to `true` in the respective V2 Insert Command nodes. Data sources must respect the syntax and give appropriate reactions: Turn on features that are categorised as "schema evolution" when the `WITH SCHEMA EVOLUTION` Syntax exists. ### Why are the changes needed? This intuitive SQL Syntax allows the user to specify Automatic Schema Evolution for a specific `INSERT` operation. Some users would like Schema Evolution for DML commands like `MERGE`, `INSERT`,... where the schema between the table and query relations can mismatch. ### Does this PR introduce _any_ user-facing change? Yes, Introducing the SQL Syntax `WITH SCHEMA EVOLUTION` to SQL `INSERT`. ### How was this patch tested? Added UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53732 from longvu-db/insert-schema-evolution. Lead-authored-by: Thang Long VU <long.vu@databricks.com> Co-authored-by: Thang Long Vu <107926660+longvu-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

.

505a9f6

github-actions bot added SQL DOCS labels Mar 28, 2024

xupefei added 3 commits March 28, 2024 14:26

fix tests

27dc21d

.

66592bf

fix tests

f83fa09

github-actions bot added the CONNECT label Mar 28, 2024

n-young-db approved these changes Mar 28, 2024

View reviewed changes

xupefei changed the title ~~[WIP][SPARK-47627] Add SQL MERGE syntax to enable schema evolution~~ [SPARK-47627] Add SQL MERGE syntax to enable schema evolution Mar 28, 2024

HyukjinKwon changed the title ~~[SPARK-47627] Add SQL MERGE syntax to enable schema evolution~~ [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution Mar 29, 2024

xupefei added 3 commits April 2, 2024 11:26

.

498cb14

Merge branch 'apache:master' into merge-schema-evolution

f84fb19

Merge branch 'master' into merge-schema-evolution

e5efbec

srielau approved these changes Apr 15, 2024

View reviewed changes

gengliangwang approved these changes Apr 17, 2024

View reviewed changes

gengliangwang closed this in 898838a Apr 17, 2024

xupefei deleted the merge-schema-evolution branch August 30, 2024 14:49

szehon-ho mentioned this pull request Jul 29, 2025

[SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 Data Source #51698

Closed

longvu-db mentioned this pull request Jan 8, 2026

[SPARK-54971] Add WITH SCHEMA EVOLUTION syntax for SQL INSERT #53732

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution#45748

[SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution#45748
xupefei wants to merge 7 commits intoapache:masterfrom
xupefei:merge-schema-evolution

xupefei commented Mar 28, 2024 •

edited

Loading

Uh oh!

n-young-db left a comment

Uh oh!

gengliangwang commented Apr 16, 2024

Uh oh!

xupefei commented Apr 16, 2024 •

edited

Loading

Uh oh!

gengliangwang commented Apr 17, 2024

Uh oh!

dongjoon-hyun commented Apr 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xupefei commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

n-young-db left a comment

Choose a reason for hiding this comment

Uh oh!

gengliangwang commented Apr 16, 2024

Uh oh!

xupefei commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gengliangwang commented Apr 17, 2024

Uh oh!

dongjoon-hyun commented Apr 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xupefei commented Mar 28, 2024 •

edited

Loading

xupefei commented Apr 16, 2024 •

edited

Loading