Skip to content

z-order fails on table that is partitioned by value with space #2834

@junhl

Description

@junhl

Environment

Delta-rs version: 0.19.1

Binding: Python

Environment:

  • Cloud provider: GCP
  • OS: MacOS
  • Other:

Bug

When you try to z-order on a table that is partitioned by a field whose value contains space, it errors out.
I was able to reproduce the issue on local filesystem (MacOS) as well as Google Cloud Storage (I didn't try AWS).

What happened:
Create a delta table and partition it by a column, whose values contain space (e.g. {"country": "Costa Rica"}). The partition is successfully created with country=Costa%20Rica to mitigate space in path.

When you perform z-order on the table, the space is not mitigated (e.g. it looks for country=Costa Rica)

What you expected to happen:

z-order is performed successfully with using the mitigated path.

How to reproduce it:

from deltalake import write_deltalake, DeltaTable
import pandas as pd

df = pd.DataFrame(
    {
        "user": ["James", "Anna", "Sara", "Martin"],
        "country": ["United States", "Canada", "Costa Rica", "South Africa"],
        "age": [34, 23, 45, 26],
    }
)

table_path = "./test_table"
write_deltalake(
    table_or_uri=table_path,
    data=df,
    mode="overwrite",
    partition_by=["country"],
)

test_table = DeltaTable(table_path)

# retrieve by partition works fine
partitioned_df = test_table.to_pandas(
    partitions=[("country", "=", "United States")],
)
print(partitioned_df)

# compact works fine
test_table.optimize.compact()

# z-order does not work
test_table.optimize.z_order(columns=["user"])

# leads to FileNotFoundError: Object at location [...]/test_table/country=Costa Rica/part-00001-78751b37-6dc0-4232-8926-9ddd6f8f14f2-c000.snappy.parquet not found: No such file or directory (os error 2)
# but real path is [...]/test_table/country=Costa%20Rica/part-00001-78751b37-6dc0-4232-8926-9ddd6f8f14f2-c000.snappy.parquet

More details:

I tried few other operations available at DeltaTable level. Z-Ordering seems to be the only one that has this pathing issue. I believe the fix needs to be on Rust side.

Metadata

Metadata

Assignees

Labels

binding/pythonIssues for the Python packagebugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions