Skip to content

[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target#52225

Closed
szehon-ho wants to merge 4 commits intoapache:masterfrom
szehon-ho:nested_merge
Closed

[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target#52225
szehon-ho wants to merge 4 commits intoapache:masterfrom
szehon-ho:nested_merge

Conversation

@szehon-ho
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho commented Sep 4, 2025

What changes were proposed in this pull request?

Support MERGE INTO where source has less fields than target. This is already partially supported as part of: #51698, but only for top level fields. This support it even for nested fields.

This patch does following:

  • For MERGE INTO with UPDATE * and INSERT *, [SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 Data Source #51698 already changed it to expand the * to fields common to source and target table schema. This change now expands it to UPDATE and INSERT for common flattened fields of source and target table schema.
  • Previously INSERT did not allow specifying a leaf field. I added this support, for this change to work. The logic is similar to UPDATE

Why are the changes needed?

For cases where source has less fields than target in MERGE INTO, it should behave more gracefully (inserting null values where source field does not exist).

Does this PR introduce any user-facing change?

No, only that this scenario used to fail and will now pass.

How was this patch tested?

Add unit test to MergeIntoTableSuiteBase

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 4, 2025
sourceCol => conf.resolver(sourceCol.name, targetAttr.name))
.map(Assignment(targetAttr, _))}
val sourceAttrs = DataTypeUtils.nestedAttributes(sourceTable.output)
val targetAttrs = DataTypeUtils.nestedAttributes(targetTable.output)
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given we only care about the name, shall we follow StructType.findNestedField and use (Seq[String], StructField) instead of AttributeReference? or just Seq[String]

val newColPath = colPath :+ field.name
nestedAttributes(structType, newColPath)
case _ => Seq(
AttributeReference((colPath :+ field.name).quoted,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fragile, we should find a way to keep the Seq[String] directly, and use it to construct UnresolvedAttribute

val commonAttrs = sourceAttrs.filter(s =>
targetAttrs.exists(t => conf.resolver(t.name, s.name)))
val assignments = commonAttrs.map{ a =>
Assignment(UnresolvedAttribute(a.name), UnresolvedAttribute(a.name))}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if all nested fields are assigned, shall we just assign the struct-type column?

@szehon-ho
Copy link
Copy Markdown
Member Author

testing this a little more, i find this doesnt support the case of structs within Map or arrays. And this approach is a bit hacky. Closing in favor of approach in #52347

@szehon-ho szehon-ho closed this Sep 15, 2025
dongjoon-hyun pushed a commit that referenced this pull request Nov 29, 2025
… a config

### What changes were proposed in this pull request?
#52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct.

This makes this feature off by default, and turned on via a config.

### Why are the changes needed?

The change brought some interesting question, for example there is some ambiguity in user intent.  Does the UPDATE SET * mean set all nested fields or top level columns?  In the first case, missing fields are kept.  In the second case, missing fields are nullified.

I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another.  A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion.

### Does this PR introduce _any_ user-facing change?
No this feature is unreleased

### How was this patch tested?
Existing unit test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53229 from szehon-ho/disable_merge_update_source_coercion.

Authored-by: Szehon Ho <szehon.apache@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Nov 29, 2025
… a config

### What changes were proposed in this pull request?
#52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct.

This makes this feature off by default, and turned on via a config.

### Why are the changes needed?

The change brought some interesting question, for example there is some ambiguity in user intent.  Does the UPDATE SET * mean set all nested fields or top level columns?  In the first case, missing fields are kept.  In the second case, missing fields are nullified.

I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another.  A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion.

### Does this PR introduce _any_ user-facing change?
No this feature is unreleased

### How was this patch tested?
Existing unit test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53229 from szehon-ho/disable_merge_update_source_coercion.

Authored-by: Szehon Ho <szehon.apache@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 23d9253)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants