Environment
Delta-rs version: 0.25.4
Binding: Python
Environment:
- Cloud provider: Azure/Local
- OS: Ubuntu
- Other:
Bug
What happened:
I'm creating a deltatable with a nullable column, If I now use write_deltalake with mode="overwrite" to write only null values to the nullable column i'm receiving an error: _internal.SchemaMismatchError: Invalid data type for Delta Lake: Null
What you expected to happen:
I'd expect the write to be succesfull resulting in a column containg only null values.
How to reproduce it:
import pandas as pd
from deltalake import DeltaTable, write_deltalake
from deltalake.schema import Schema, PrimitiveType, Field
table_name = "some_table"
schema = Schema([
Field("id", PrimitiveType("integer"), nullable=False),
Field("text", PrimitiveType("string"), nullable=True),
])
if not DeltaTable.is_deltatable(table_name):
dt = DeltaTable.create(table_name, schema=schema)
else:
dt = DeltaTable(table_name)
df = pd.DataFrame(
columns=["id", "text"],
data=[
[2, None],
]
)
write_deltalake(dt, df, mode="overwrite")
More details:
I'm receiving the same error without mode="overwrite" as well.
When the table already contains data the same happens.
Using TableMerger allows me to successfully write this data
Environment
Delta-rs version: 0.25.4
Binding: Python
Environment:
Bug
What happened:
I'm creating a deltatable with a nullable column, If I now use
write_deltalakewithmode="overwrite"to write only null values to the nullable column i'm receiving an error:_internal.SchemaMismatchError: Invalid data type for Delta Lake: NullWhat you expected to happen:
I'd expect the write to be succesfull resulting in a column containg only null values.
How to reproduce it:
More details:
I'm receiving the same error without mode="overwrite" as well.
When the table already contains data the same happens.
Using
TableMergerallows me to successfully write this data