Environment
Delta-rs version: 0.19.1
Binding: Python
Environment:
- Cloud provider: GCP
- OS: MacOS
- Other:
Bug
When you try to z-order on a table that is partitioned by a field whose value contains space, it errors out.
I was able to reproduce the issue on local filesystem (MacOS) as well as Google Cloud Storage (I didn't try AWS).
What happened:
Create a delta table and partition it by a column, whose values contain space (e.g. {"country": "Costa Rica"}). The partition is successfully created with country=Costa%20Rica to mitigate space in path.
When you perform z-order on the table, the space is not mitigated (e.g. it looks for country=Costa Rica)
What you expected to happen:
z-order is performed successfully with using the mitigated path.
How to reproduce it:
from deltalake import write_deltalake, DeltaTable
import pandas as pd
df = pd.DataFrame(
{
"user": ["James", "Anna", "Sara", "Martin"],
"country": ["United States", "Canada", "Costa Rica", "South Africa"],
"age": [34, 23, 45, 26],
}
)
table_path = "./test_table"
write_deltalake(
table_or_uri=table_path,
data=df,
mode="overwrite",
partition_by=["country"],
)
test_table = DeltaTable(table_path)
# retrieve by partition works fine
partitioned_df = test_table.to_pandas(
partitions=[("country", "=", "United States")],
)
print(partitioned_df)
# compact works fine
test_table.optimize.compact()
# z-order does not work
test_table.optimize.z_order(columns=["user"])
# leads to FileNotFoundError: Object at location [...]/test_table/country=Costa Rica/part-00001-78751b37-6dc0-4232-8926-9ddd6f8f14f2-c000.snappy.parquet not found: No such file or directory (os error 2)
# but real path is [...]/test_table/country=Costa%20Rica/part-00001-78751b37-6dc0-4232-8926-9ddd6f8f14f2-c000.snappy.parquet
More details:
I tried few other operations available at DeltaTable level. Z-Ordering seems to be the only one that has this pathing issue. I believe the fix needs to be on Rust side.
Environment
Delta-rs version: 0.19.1
Binding: Python
Environment:
Bug
When you try to z-order on a table that is partitioned by a field whose value contains space, it errors out.
I was able to reproduce the issue on local filesystem (MacOS) as well as Google Cloud Storage (I didn't try AWS).
What happened:
Create a delta table and partition it by a column, whose values contain space (e.g. {"country": "Costa Rica"}). The partition is successfully created with
country=Costa%20Ricato mitigate space in path.When you perform z-order on the table, the space is not mitigated (e.g. it looks for
country=Costa Rica)What you expected to happen:
z-order is performed successfully with using the mitigated path.
How to reproduce it:
More details:
I tried few other operations available at DeltaTable level. Z-Ordering seems to be the only one that has this pathing issue. I believe the fix needs to be on Rust side.