Environment
Delta-rs version:
v.17.0
Binding:
Environment:
- OS: Linux, under python3 shell.
Bug
Details with steps to reproduce
I'm using Delta lake v 0.17.0
Here are the steps performed:
- Read in the DeltaTable from existing S3 location. dt = DeltaTable("s3://mylocation/")
- Converted it to pyarrow table. arrow_table = dt.to_pyarrow_table()
- Filtered the arrow table and selected specific columns of interest
- Converted arrow table to pandas data frame. df = arrow_table.to_pandas()
- Writing panda dataframe back to existing new delta table. Table is empty at this point.
- write_deltalake("s3://test_sample_process/", df, mode="overwrite"). also tried it with schema_mode="overwrite"
To trouble shoot it I looked into deltalake/writer.py file. The Exception is thrown form ln 351. It is trying to sort the data schema and table schema and then match them. It is using pyarrow engine.
The output of matching is on visual inspection.
Actual Error message printed
python/3.12.7/lib/python3.12/site-packages/deltalake/writer.py", line 351, in write_deltalake
raise ValueError(
ValueError: Schema of data does not match table schema
Data schema:
namespace: string
ki_record_name: string
work_center: string
kt_config: string
kt_parameters: string
mi_updated_at: timestamp[us, tz=UTC]
mi_updated_by: string
Table Schema:
namespace: string
ki_record_name: string
work_center: string not null
kt_config: string
kt_parameters: string
mi_updated_at: timestamp[us, tz=UTC] not null
-- field metadata --
comment: '"The time this record was updated"'
mi_updated_by: string not null
-- field metadata --
comment: '"The process that updated this record"'
Would appreciate any help in figuring out my Table and Data schema are considered as mis-match by code when they seem to be same.
I couldn't isolate the difference in 2 schemas other than table schema as comment and not null defined. The field names and data types are same for both. Wondering what I am missing here.
Environment
Delta-rs version:
v.17.0
Binding:
Environment:
Bug
Details with steps to reproduce
I'm using Delta lake v 0.17.0
Here are the steps performed:
To trouble shoot it I looked into deltalake/writer.py file. The Exception is thrown form ln 351. It is trying to sort the data schema and table schema and then match them. It is using pyarrow engine.
The output of matching is on visual inspection.
Actual Error message printed
Would appreciate any help in figuring out my Table and Data schema are considered as mis-match by code when they seem to be same.
I couldn't isolate the difference in 2 schemas other than table schema as comment and not null defined. The field names and data types are same for both. Wondering what I am missing here.