Skip to content

DeltaError on attempting merge operation involving map field #3340

@chsgray

Description

@chsgray

Environment

Delta-rs version: 0.25.4
Pandas version: 2.2.3
Pyarrow version: 18.1.0

Binding: Python

Environment:

  • OS: Ubuntu

Bug

What happened: A DeltaError was raised when attempting a DeltaTable.merge operation. Library-assigned map field names appear to be mutually incompatible, which is beyond the user's control.

What you expected to happen: I expected the merge to succeed.

How to reproduce it:

import pyarrow as pa
import pandas as pd
import deltalake

path = "~/tmp/test"

schema = pa.schema(
    [
        pa.field(
            "foo",
            pa.int64(),
            nullable=False,
        ),
        pa.field(
            "bar",
            pa.map_(pa.string(), pa.float64()),
            nullable=True,
        ),
    ]
)

df = pd.DataFrame(data={"foo": 1, "bar": [{"baz": 123.4}]})

deltalake.write_deltalake(
    path,
    pa.Table.from_pandas(df, schema),
    mode="append"
)

(
    deltalake.DeltaTable(path)
    .merge(
        pa.Table.from_pandas(df, schema),
        predicate="s.foo = t.foo",
        target_alias="t",
        source_alias="s",
    )
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute()
)

More details:

The traceback I saw in my Jupyter notebook:

DeltaError                                Traceback (most recent call last)
Cell In[6], line 40
     22 df = pd.DataFrame(data={"foo": 1, "bar": [{"baz": 123.4}]})
     24 deltalake.write_deltalake(
     25     path,
     26     pa.Table.from_pandas(df, schema),
     27     mode="append"
     28 )
     30 (
     31     deltalake.DeltaTable(path)
     32     .merge(
     33         pa.Table.from_pandas(df, schema),
     34         predicate="s.foo = t.foo",
     35         target_alias="t",
     36         source_alias="s",
     37     )
     38     .when_matched_update_all()
     39     .when_not_matched_insert_all()
---> 40     .execute()
     41 )

File ~/miniconda3/envs/drs/lib/python3.11/site-packages/deltalake/table.py:1930, in TableMerger.execute(self)
   1924 def execute(self) -> Dict[str, Any]:
   1925     """Executes `MERGE` with the previously provided settings in Rust with Apache Datafusion query engine.
   1926 
   1927     Returns:
   1928         Dict: metrics
   1929     """
-> 1930     metrics = self._table.merge_execute(self._builder)
   1931     return json.loads(metrics)

DeltaError: Generic DeltaTable error: type_coercion
caused by
Error during planning: Failed to coerce then ([Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false), Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false), Map(Field { name: "key_value", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false), Map(Field { name: "key_value", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false), Map(Field { name: "key_value", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)]) and else (None) to common types in CASE WHEN expression

As a screenshot:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingon-holdIssues and Pull Requests that are on hold for some reason

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions