[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target#52225
Closed
szehon-ho wants to merge 4 commits intoapache:masterfrom
Closed
[SPARK-53482][SQL] MERGE INTO support nested case where source has less fields than target#52225szehon-ho wants to merge 4 commits intoapache:masterfrom
szehon-ho wants to merge 4 commits intoapache:masterfrom
Conversation
…ss fields than target
cloud-fan
reviewed
Sep 8, 2025
| sourceCol => conf.resolver(sourceCol.name, targetAttr.name)) | ||
| .map(Assignment(targetAttr, _))} | ||
| val sourceAttrs = DataTypeUtils.nestedAttributes(sourceTable.output) | ||
| val targetAttrs = DataTypeUtils.nestedAttributes(targetTable.output) |
Contributor
There was a problem hiding this comment.
given we only care about the name, shall we follow StructType.findNestedField and use (Seq[String], StructField) instead of AttributeReference? or just Seq[String]
cloud-fan
reviewed
Sep 8, 2025
| val newColPath = colPath :+ field.name | ||
| nestedAttributes(structType, newColPath) | ||
| case _ => Seq( | ||
| AttributeReference((colPath :+ field.name).quoted, |
Contributor
There was a problem hiding this comment.
this is fragile, we should find a way to keep the Seq[String] directly, and use it to construct UnresolvedAttribute
cloud-fan
reviewed
Sep 8, 2025
| val commonAttrs = sourceAttrs.filter(s => | ||
| targetAttrs.exists(t => conf.resolver(t.name, s.name))) | ||
| val assignments = commonAttrs.map{ a => | ||
| Assignment(UnresolvedAttribute(a.name), UnresolvedAttribute(a.name))} |
Contributor
There was a problem hiding this comment.
if all nested fields are assigned, shall we just assign the struct-type column?
Member
Author
|
testing this a little more, i find this doesnt support the case of structs within Map or arrays. And this approach is a bit hacky. Closing in favor of approach in #52347 |
dongjoon-hyun
pushed a commit
that referenced
this pull request
Nov 29, 2025
… a config ### What changes were proposed in this pull request? #52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct. This makes this feature off by default, and turned on via a config. ### Why are the changes needed? The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified. I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion. ### Does this PR introduce _any_ user-facing change? No this feature is unreleased ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #53229 from szehon-ho/disable_merge_update_source_coercion. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun
pushed a commit
that referenced
this pull request
Nov 29, 2025
… a config ### What changes were proposed in this pull request? #52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct. This makes this feature off by default, and turned on via a config. ### Why are the changes needed? The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified. I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion. ### Does this PR introduce _any_ user-facing change? No this feature is unreleased ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #53229 from szehon-ho/disable_merge_update_source_coercion. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 23d9253) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Support MERGE INTO where source has less fields than target. This is already partially supported as part of: #51698, but only for top level fields. This support it even for nested fields.
This patch does following:
UPDATE *andINSERT *, [SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 Data Source #51698 already changed it to expand the*to fields common to source and target table schema. This change now expands it to UPDATE and INSERT for common flattened fields of source and target table schema.Why are the changes needed?
For cases where source has less fields than target in MERGE INTO, it should behave more gracefully (inserting null values where source field does not exist).
Does this PR introduce any user-facing change?
No, only that this scenario used to fail and will now pass.
How was this patch tested?
Add unit test to MergeIntoTableSuiteBase
Was this patch authored or co-authored using generative AI tooling?
No