Skip to content

write_deltalake is not creating checkpoints #1815

@yefetBenTili

Description

@yefetBenTili

Delta-rs version: 0.10.0

Binding:

Environment:
Cloud provider: AWS
OS: macOs
Other:


We have a Delta Lake on S3 with over 2TB of data, which we write to daily. using we use write_deltalake (writing new partitions every day with partition filters)

We noticed a significant decline in read performance after a few weeks. which led to further investigation I discovered that no checkpoint files were being written. Currently, I am at over 4000 transaction JSON files, and no checkpoint file is there.

As far as I know, Delta's default behavior includes checkpointing after the 10th version. Is there a way to enforce this or trigger it manually?

    write_deltalake(
        df
        mode="overwrite",
        schema=config.persrec_history_schema,
        storage_options={"AWS_S3_ALLOW_UNSAFE_RENAME": "True"},
        partition_by=[*partition_dict.keys()],
        partition_filters= partiton_filters],
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions