Skip to content

Unable to use write_deltalake on nullable columns with mode = "overwrite" #3387

@joepatol

Description

@joepatol

Environment

Delta-rs version: 0.25.4

Binding: Python

Environment:

  • Cloud provider: Azure/Local
  • OS: Ubuntu
  • Other:

Bug

What happened:

I'm creating a deltatable with a nullable column, If I now use write_deltalake with mode="overwrite" to write only null values to the nullable column i'm receiving an error: _internal.SchemaMismatchError: Invalid data type for Delta Lake: Null

What you expected to happen:

I'd expect the write to be succesfull resulting in a column containg only null values.

How to reproduce it:

import pandas as pd
from deltalake import DeltaTable, write_deltalake
from deltalake.schema import Schema, PrimitiveType, Field


table_name = "some_table"

schema = Schema([
    Field("id", PrimitiveType("integer"), nullable=False),
    Field("text", PrimitiveType("string"), nullable=True),
])

if not DeltaTable.is_deltatable(table_name):
    dt = DeltaTable.create(table_name, schema=schema)
else:
    dt = DeltaTable(table_name)


df = pd.DataFrame(
    columns=["id", "text"],
    data=[
        [2, None],
    ]
)


write_deltalake(dt, df, mode="overwrite")

More details:

I'm receiving the same error without mode="overwrite" as well.
When the table already contains data the same happens.

Using TableMerger allows me to successfully write this data

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions