Skip to content

Commit b211db5

Browse files
authored
fix(delete): use Add metadata for partition only DELETE (delta-io#4150)
# Description Fix regression where `DELETE` with partition only predicates failed to remove empty files in matching partitions. **Root Cause:** When the delete predicate references only partition columns, file removal should be decided from log metadata alone. Prior implementation relied on scan derived results and empty files (zero rows) produced no matches. They weren't removed even though their partition satisfied the predicate. **Changes:** - Add partition only delete fast path: evaluates predicate against `Add.partition_values` (metadata only) - Refine partition only predicate detection, remove redundant validation - Add `remove_from_add` helper for removing files via their `Add` actions - Add helper for partition predicate file finding **Tests:** Regression coverage for: - Partition only deletes remove empty files in matching partitions (repro for delta-io#4149) - NULL partition values handled correctly - Partition-only path avoids unnecessary scanning ```bash cargo fmt --check cargo test --workspace cargo test -p deltalake-core --features datafusion ``` # Related Issue(s) - Fixes delta-io#4149 <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation ---> Notes - Scoped to partition only predicates; intended semantics preserved - NULL partition handling explicitly tested --------- Signed-off-by: Ethan Urbanski <ethan@urbanskitech.com>
1 parent 4ac9f71 commit b211db5

2 files changed

Lines changed: 519 additions & 15 deletions

File tree

crates/core/src/delta_datafusion/find_files.rs

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -410,10 +410,6 @@ pub(crate) struct MatchedFilesScan {
410410
/// we do not (yet) have full coverage to translate datafusion to
411411
/// kernel predicates.
412412
pub(crate) delta_predicate: Arc<Predicate>,
413-
/// The predicate contains only partition column references
414-
///
415-
/// This implies that for each matched file all data matches.
416-
pub(crate) partition_only: bool,
417413
}
418414

419415
impl MatchedFilesScan {
@@ -467,9 +463,8 @@ pub(crate) async fn scan_files_where_matches(
467463
result: Ok(()),
468464
};
469465
for term in &skipping_pred {
470-
visitor.result = Ok(());
471466
term.visit(&mut visitor)?;
472-
visitor.result?
467+
std::mem::replace(&mut visitor.result, Ok(()))?;
473468
}
474469

475470
// convert to a delta predicate that can be applied to kernel scans.
@@ -554,7 +549,6 @@ pub(crate) async fn scan_files_where_matches(
554549
valid_files,
555550
predicate,
556551
delta_predicate,
557-
partition_only: visitor.partition_only,
558552
}))
559553
}
560554

0 commit comments

Comments
 (0)