[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target by szehon-ho · Pull Request #52225 · apache/spark

szehon-ho · 2025-09-04T01:10:27Z

What changes were proposed in this pull request?

Support MERGE INTO where source has less fields than target. This is already partially supported as part of: #51698, but only for top level fields. This support it even for nested fields.

This patch does following:

For MERGE INTO with UPDATE * and INSERT *, [SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 Data Source #51698 already changed it to expand the * to fields common to source and target table schema. This change now expands it to UPDATE and INSERT for common flattened fields of source and target table schema.
Previously INSERT did not allow specifying a leaf field. I added this support, for this change to work. The logic is similar to UPDATE

Why are the changes needed?

For cases where source has less fields than target in MERGE INTO, it should behave more gracefully (inserting null values where source field does not exist).

Does this PR introduce any user-facing change?

No, only that this scenario used to fail and will now pass.

How was this patch tested?

Add unit test to MergeIntoTableSuiteBase

Was this patch authored or co-authored using generative AI tooling?

No

…ss fields than target

cloud-fan · 2025-09-08T09:08:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-                      sourceCol => conf.resolver(sourceCol.name, targetAttr.name))
-                    .map(Assignment(targetAttr, _))}
+                val sourceAttrs = DataTypeUtils.nestedAttributes(sourceTable.output)
+                val targetAttrs = DataTypeUtils.nestedAttributes(targetTable.output)


given we only care about the name, shall we follow StructType.findNestedField and use (Seq[String], StructField) instead of AttributeReference? or just Seq[String]

cloud-fan · 2025-09-08T09:10:24Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/DataTypeUtils.scala

+          val newColPath = colPath :+ field.name
+          nestedAttributes(structType, newColPath)
+        case _ => Seq(
+          AttributeReference((colPath :+ field.name).quoted,


this is fragile, we should find a way to keep the Seq[String] directly, and use it to construct UnresolvedAttribute

cloud-fan · 2025-09-08T09:11:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

+                val commonAttrs = sourceAttrs.filter(s =>
+                  targetAttrs.exists(t => conf.resolver(t.name, s.name)))
+                val assignments = commonAttrs.map{ a =>
+                  Assignment(UnresolvedAttribute(a.name), UnresolvedAttribute(a.name))}


if all nested fields are assigned, shall we just assign the struct-type column?

szehon-ho · 2025-09-15T21:41:49Z

testing this a little more, i find this doesnt support the case of structs within Map or arrays. And this approach is a bit hacky. Closing in favor of approach in #52347

… a config ### What changes were proposed in this pull request? #52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct. This makes this feature off by default, and turned on via a config. ### Why are the changes needed? The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified. I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion. ### Does this PR introduce _any_ user-facing change? No this feature is unreleased ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #53229 from szehon-ho/disable_merge_update_source_coercion. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

… a config ### What changes were proposed in this pull request? #52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct. This makes this feature off by default, and turned on via a config. ### Why are the changes needed? The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified. I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion. ### Does this PR introduce _any_ user-facing change? No this feature is unreleased ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #53229 from szehon-ho/disable_merge_update_source_coercion. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 23d9253) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-53482][SQL] MERGE INTO support nested case where source has le…

988d0c8

…ss fields than target

github-actions bot added the SQL label Sep 4, 2025

szehon-ho added 3 commits September 4, 2025 13:36

Fix some tests

6b74861

Fix PlanResolutionSuite

d37d351

Fix another test

d59d9b9

cloud-fan reviewed Sep 8, 2025

View reviewed changes

szehon-ho closed this Sep 15, 2025

szehon-ho mentioned this pull request Nov 26, 2025

[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target#52225

[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target#52225
szehon-ho wants to merge 4 commits intoapache:masterfrom
szehon-ho:nested_merge

szehon-ho commented Sep 4, 2025 •

edited

Loading

Uh oh!

cloud-fan Sep 8, 2025 •

edited

Loading

Uh oh!

cloud-fan Sep 8, 2025

Uh oh!

cloud-fan Sep 8, 2025

Uh oh!

szehon-ho commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szehon-ho commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho commented Sep 4, 2025 •

edited

Loading

cloud-fan Sep 8, 2025 •

edited

Loading