Description
Allow specifying specific versions to retain when performing a vacuum operation.
Use Case
We have specific data history that we want to keep. Generally, we want to read either:
- A recentish data version (within the last 7 days).
- A collection of old fixed versions going back 250 days.
We cannot run a general vacuum operation, as that will remove files referenced in the old fixed versions (2).
What we would like, is more control of the vacuum operation to specify what versions to keep along with the associated files.
So, our old delta files directory might look like
{version 0-files}
{version 1-files}
{version 2-files}
...
{version 178-files}
{version 179-files}
{version 180-files}
And then we want to perform a vacuum where we keep versions {2, 179, 180} to get a new directory like:
{version 2-files}
{version 179-files}
{version 180-files}
We propose enhancing the vacuum operation to add an option to keep specific versions as specified in the VacuumBuilder. The idea would be to provide a vector of versions to keep. Then, after VACUUM it would still be possible to time travel back to the specified versions.
https://docs.rs/deltalake/latest/deltalake/operations/vacuum/struct.VacuumBuilder.html
Related Issue(s)
[Delete older files based on VERSION](#3143)
Background Reading
Vacuuming: https://delta.io/blog/remove-files-delta-lake-vacuum-command/
Time Travel: https://delta.io/blog/2023-02-01-delta-lake-time-travel/
Description
Allow specifying specific versions to retain when performing a vacuum operation.
Use Case
We have specific data history that we want to keep. Generally, we want to read either:
We cannot run a general vacuum operation, as that will remove files referenced in the old fixed versions (2).
What we would like, is more control of the vacuum operation to specify what versions to keep along with the associated files.
So, our old delta files directory might look like
And then we want to perform a vacuum where we keep versions {2, 179, 180} to get a new directory like:
We propose enhancing the vacuum operation to add an option to keep specific versions as specified in the VacuumBuilder. The idea would be to provide a vector of versions to keep. Then, after VACUUM it would still be possible to time travel back to the specified versions.
https://docs.rs/deltalake/latest/deltalake/operations/vacuum/struct.VacuumBuilder.html
Related Issue(s)
[Delete older files based on VERSION](#3143)
Background Reading
Vacuuming: https://delta.io/blog/remove-files-delta-lake-vacuum-command/
Time Travel: https://delta.io/blog/2023-02-01-delta-lake-time-travel/