Bugfix: don't roll back the whole scan batch on one duplicate name

vdavid · vdavid · commit eb69228705c1 · 2026-05-26T00:03:15.000+02:00
- `insert_entries_v2_batch` used plain `INSERT` inside a savepoint, so a single `(parent_id, name_folded)` collision (case-sensitive volumes with `Foo.txt`/`foo.txt` siblings, NFC/NFD twins from cross-OS sync, etc.) rolled back the entire 2000-entry batch — silently dropping 1999 unrelated rows plus orphaning every descendant whose `parent_id` was allocated from that batch.
- Switch to `INSERT OR IGNORE` and return a `Vec&lt;bool&gt;` parallel to the input so the caller can tell which rows actually landed.
- `handle_insert_entries_v2` filters the accumulator input by that vec, so the in-memory aggregation state still satisfies "never claim more than the DB has" (the constraint that protects against the historical 1.83 TB ghost size on `..`).
- Log conflicts at WARN with a sample of `(parent_id, name)` pairs so the offending filesystem is diagnosable.
- `accumulate` now takes `impl IntoIterator&lt;Item = &amp;EntryRow&gt;` so the filtered-iteration path doesn't need an extra clone.
- Update the existing accumulator-consistency test to assert the new per-row behavior; rename it to `handle_insert_entries_v2_only_accumulates_rows_that_landed`. Replace the old "duplicate rejected" store test with one that verifies graceful skip + survivors land.
- Update `indexing/CLAUDE.md` gotchas to reflect the new contract.
diff --git a/apps/desktop/src-tauri/src/indexing/CLAUDE.md b/apps/desktop/src-tauri/src/indexing/CLAUDE.md
@@ -172,7 +172,7 @@ function is shared so the two gate sites can't drift apart.
 
 **Lock-first `start_indexing`, atomic phase reservation**: `start_indexing` opens a temporary `IndexStore` and atomically claims the `Disabled -> Initializing(store)` transition via `try_reserve_initializing_phase()` BEFORE constructing the heavy `IndexManager` (which spawns the writer thread). If the phase is already `Initializing`, `Running`, or `ShuttingDown`, the call is a no-op. Without the lock-first claim, two near-simultaneous calls can both spawn writer threads — each with its own `Arc<AtomicI64>` ID counter and `AccumulatorMaps` — racing on the same DB. The lock-first guard makes second-call no-op behaviour the contract; the `UNIQUE (parent_id, name_folded)` index is the safety net if the contract is ever bypassed.
 
-**`accumulator.accumulate` runs only after the DB commit succeeds**: `handle_insert_entries_v2` inserts via `IndexStore::insert_entries_v2_batch` first, and only updates `AccumulatorMaps` on `Ok`. The batch is wrapped in a SQLite savepoint that rolls back on any conflict (including UNIQUE on `(parent_id, name_folded)`); if the accumulator runs before the commit, a failed batch leaves it claiming bytes that never reached the DB, and `compute_all_aggregates_with_maps` then writes inflated `dir_stats`. The contract — "in-memory state never claims more than the DB has" — must hold under any future failure mode, even though the lock-first guard makes the failure path rare in practice.
+**`accumulator.accumulate` runs only for rows that actually landed**: `handle_insert_entries_v2` inserts via `IndexStore::insert_entries_v2_batch`, which uses `INSERT OR IGNORE` and returns a `Vec<bool>` parallel to the input. A duplicate `(parent_id, name_folded)` (case-sensitive volumes with `Foo.txt` + `foo.txt`, NFC/NFD duplicates from cross-OS sync, etc.) skips just that row and the rest of the batch survives; without this, a single collision used to roll back the whole ~2000-entry batch via the savepoint, silently dropping huge swaths of the index. `handle_insert_entries_v2` filters `entries` by the returned flags before calling `accumulator.accumulate`, so the in-memory aggregation state never claims bytes that lost the OR-IGNORE. The contract — "in-memory state never claims more than the DB has" — must hold under any future failure mode (this was one of the mechanisms behind the historical 1.83 TB ghost size on `..` of a 994 GB volume; the other was two writers racing, closed by the lock-first guard above). Conflicts are logged at WARN with sample names so the offending filesystem is diagnosable.
 
 **Progressive `index-dir-updated` emit during background verification**: `run_background_verification` emits one `index-dir-updated` event per successfully-scanned new subtree, immediately after the post-scan writer flush. Don't buffer new-dir paths and fire a single end-of-verification emit instead: that window runs up to 5 minutes for a typical home folder, and any listing opened in it stays on `<dir>` placeholders the whole time (listing-time enrichment runs before the relevant `dir_stats` rows exist; the single emit often misses the right paths because it carries `affected_paths` from the replay rather than the verification-discovered paths; and the `/` full-refresh sentinel case is filtered out by the FE's strict-descendant check). The per-subtree emit gives a progressive reveal: each folder's recursive size pops in as soon as its subtree aggregation commits. The FE handler in `index-events.ts` is throttled at 2 s per pane, so a burst of subtree completions naturally coalesces into one refresh per pane per 2 s window. The end-of-verification emit carries just `affected_paths`; `new_dir_paths` are emitted progressively.
 
@@ -271,7 +271,7 @@ per-event detail back: `RUST_LOG=cmdr_lib::indexing::reconciler=trace,cmdr_lib::
 
 **Reconciler must delete old subtree on dir-to-file type changes**: When `reconcile_subtree` matches a filesystem entry to a DB entry by name, it must check if `is_directory` changed. If a directory became a file, `DeleteSubtreeById` must be sent before `UpsertEntryV2`. Without this, `INSERT OR REPLACE` keeps the same row ID (same `parent_id + name`), and the old directory's children become logical orphans: entries parented by a file.
 
-**Scanner's `insert_entries_v2_batch` uses plain `INSERT`**: `INSERT OR REPLACE` would silently delete the old row and insert a new one with a new ID, orphaning all children. The table has two unique constraints: PK on `id`, and (since v12) the composite UNIQUE `(parent_id, name_folded)`. IDs are allocated from a shared `Arc<AtomicI64>` counter owned by `IndexWriter`, and the table is truncated before full scans (or descendants deleted before subtree scans), so under a single live writer neither conflict can occur. The plain `INSERT` reflects the invariant. If a conflict ever does fire (e.g. someone defeats the lock-first guard), the batch savepoint rolls back and `handle_insert_entries_v2` skips the accumulator update, so the in-memory aggregation state stays consistent with the DB.
+**Scanner's `insert_entries_v2_batch` uses `INSERT OR IGNORE`**: `INSERT OR REPLACE` would silently delete the old row and insert a new one with a new ID, orphaning all children — never the right move. Plain `INSERT` would roll back the entire 2000-entry batch on any conflict via the wrapping savepoint, which is catastrophic if a single filesystem oddity (two siblings folding to the same `name_folded` on a case-sensitive volume, NFC/NFD twins from cross-OS sync) takes out 1999 unrelated rows. `INSERT OR IGNORE` skips just the conflicting row; the batch returns `Vec<bool>` so `handle_insert_entries_v2` can filter the accumulator input. The table has two unique constraints: PK on `id`, and (since v12) the composite UNIQUE `(parent_id, name_folded)`. IDs are allocated from a shared `Arc<AtomicI64>` counter owned by `IndexWriter`, and the table is truncated before full scans (or descendants deleted before subtree scans), so under a single live writer PK conflicts shouldn't occur in practice. The savepoint still wraps the batch so that non-constraint errors (disk full, etc.) roll back cleanly.
 
 **IndexWriter owns the shared `next_id` counter**: All ID allocation goes through an `Arc<AtomicI64>` owned by `IndexWriter`. The scanner's `ScanContext` atomically increments it via `alloc_id()`, and the writer thread bumps it via `fetch_max` after `UpsertEntryV2` inserts (which let SQLite auto-assign). `TruncateData` resets it to 2. Don't fall back to reading `MAX(id)` from a read connection for allocation: the writer can have uncommitted inserts in its channel, so the read sees a stale value and the scanner double-assigns IDs. `IndexWriter` exposes `db_path()`. The scanner opens a temporary connection for `resolve_path` (subtree scans) and `ensure_root_sentinel` (volume scans), but not for ID allocation.
 
diff --git a/apps/desktop/src-tauri/src/indexing/store.rs b/apps/desktop/src-tauri/src/indexing/store.rs
@@ -868,21 +868,33 @@ impl IndexStore {
     ///
     /// Uses a savepoint instead of `unchecked_transaction()` so it nests correctly
     /// inside explicit transactions (replay's `BEGIN IMMEDIATE`).
-    pub fn insert_entries_v2_batch(conn: &Connection, entries: &[EntryRow]) -> Result<(), IndexStoreError> {
+    ///
+    /// Uses `INSERT OR IGNORE` so a single `(parent_id, name_folded)` collision
+    /// (case-sensitive filesystems with `Foo.txt`/`foo.txt` siblings, NFC/NFD
+    /// duplicates from cross-OS sync, etc.) skips just that row rather than
+    /// rolling back the whole batch of ~2000 entries. Returns a `Vec<bool>`
+    /// parallel to `entries` where each element indicates whether the
+    /// corresponding row actually landed in the DB. Callers (the writer
+    /// thread's accumulator) must consult this so the in-memory aggregation
+    /// state never claims more than the DB actually has.
+    pub fn insert_entries_v2_batch(conn: &Connection, entries: &[EntryRow]) -> Result<Vec<bool>, IndexStoreError> {
         if entries.is_empty() {
-            return Ok(());
+            return Ok(Vec::new());
         }
         with_savepoint(conn, "insert_entries", |conn| {
-            // Plain INSERT: IDs are allocated from a shared AtomicI64 counter owned by
-            // IndexWriter, so conflicts shouldn't occur. The table is truncated before
-            // full scans and descendants are deleted before subtree scans.
+            // INSERT OR IGNORE: the table is truncated before full scans and
+            // descendants are deleted before subtree scans, so collisions
+            // against existing rows are rare, but two siblings with the same
+            // `name_folded` can show up on case-sensitive volumes / sync
+            // sources. Skip the duplicate, keep the rest.
             let mut stmt = conn.prepare_cached(
-                "INSERT INTO entries (id, parent_id, name, name_folded, is_directory, is_symlink, logical_size, physical_size, modified_at, inode)
+                "INSERT OR IGNORE INTO entries (id, parent_id, name, name_folded, is_directory, is_symlink, logical_size, physical_size, modified_at, inode)
                  VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10)",
             )?;
+            let mut inserted = Vec::with_capacity(entries.len());
             for e in entries {
                 let name_folded = normalize_for_comparison(&e.name);
-                stmt.execute(params![
+                let rows = stmt.execute(params![
                     e.id,
                     e.parent_id,
                     e.name,
@@ -894,8 +906,9 @@ impl IndexStore {
                     e.modified_at,
                     e.inode,
                 ])?;
+                inserted.push(rows == 1);
             }
-            Ok(())
+            Ok(inserted)
         })
     }
 
@@ -2049,8 +2062,15 @@ mod tests {
         );
     }
 
+    /// Batch insert uses `INSERT OR IGNORE`: a duplicate `(parent_id, name_folded)`
+    /// in the batch (or against an existing row) skips just that row, keeping
+    /// every other entry in the batch. The returned `Vec<bool>` flags which
+    /// rows actually landed. This replaces the previous roll-back-the-whole-batch
+    /// behavior, which silently dropped ~2000 unrelated entries every time a
+    /// scan encountered two siblings with colliding `name_folded` (case-sensitive
+    /// volumes, NFC/NFD duplicates, etc.).
     #[test]
-    fn duplicate_parent_name_folded_rejected_batch_insert() {
+    fn duplicate_parent_name_folded_skipped_in_batch_insert() {
         let (_store, dir) = open_temp_store();
         let db_path = dir.path().join("test-index.db");
         let conn = IndexStore::open_write_connection(&db_path).unwrap();
@@ -2073,21 +2093,31 @@ mod tests {
                 name: "dup.txt".into(),
                 is_directory: false,
                 is_symlink: false,
-                logical_size: Some(10),
-                physical_size: Some(10),
+                logical_size: Some(20),
+                physical_size: Some(20),
+                modified_at: None,
+                inode: None,
+            },
+            EntryRow {
+                id: 102,
+                parent_id: ROOT_ID,
+                name: "unrelated.txt".into(),
+                is_directory: false,
+                is_symlink: false,
+                logical_size: Some(30),
+                physical_size: Some(30),
                 modified_at: None,
                 inode: None,
             },
         ];
-        let result = IndexStore::insert_entries_v2_batch(&conn, &entries);
-        assert!(
-            result.is_err(),
-            "batch with duplicate (parent_id, name_folded) must fail; got {result:?}"
-        );
+        let inserted = IndexStore::insert_entries_v2_batch(&conn, &entries).unwrap();
+        assert_eq!(inserted, vec![true, false, true]);
 
-        // Savepoint rolls back the whole batch, so neither row should be present.
-        assert!(IndexStore::get_entry_by_id(&conn, 100).unwrap().is_none());
+        // First duplicate wins; the second is dropped; the unrelated entry survives.
+        // Without the per-row skip, the savepoint used to roll back ALL THREE.
+        assert!(IndexStore::get_entry_by_id(&conn, 100).unwrap().is_some());
         assert!(IndexStore::get_entry_by_id(&conn, 101).unwrap().is_none());
+        assert!(IndexStore::get_entry_by_id(&conn, 102).unwrap().is_some());
     }
 
     #[cfg(target_os = "macos")]
diff --git a/apps/desktop/src-tauri/src/indexing/writer.rs b/apps/desktop/src-tauri/src/indexing/writer.rs
@@ -468,10 +468,12 @@ impl AccumulatorMaps {
         }
     }
 
-    /// Accumulate stats from a batch of inserted entries.
-    fn accumulate(&mut self, entries: &[EntryRow]) {
-        self.entries_inserted += entries.len() as u64;
+    /// Accumulate stats from a set of inserted entries. Accepts any iterator
+    /// of `&EntryRow` so callers can pre-filter (skipping rows that lost a
+    /// UNIQUE conflict during `INSERT OR IGNORE`) without an extra clone.
+    fn accumulate<'a>(&mut self, entries: impl IntoIterator<Item = &'a EntryRow>) {
         for entry in entries {
+            self.entries_inserted += 1;
             let stats = self.direct_stats.entry(entry.parent_id).or_insert((0, 0, 0, 0, false));
             if entry.is_symlink {
                 stats.4 = true;
@@ -817,14 +819,45 @@ fn handle_insert_entries_v2(
 ) {
     let count = entries.len();
     let t = Instant::now();
-    // Accumulate AFTER the DB commit succeeds. The savepoint inside
-    // `insert_entries_v2_batch` rolls the whole batch back on any conflict
-    // (PK or UNIQUE on `(parent_id, name_folded)` — schema v12), so the
-    // accumulator must not claim rows that never landed; otherwise
-    // `compute_all_aggregates_with_maps` inflates `dir_stats` with phantom
-    // bytes.
+    // Accumulate AFTER the DB commit succeeds. `insert_entries_v2_batch`
+    // uses `INSERT OR IGNORE`, so a UNIQUE conflict on
+    // `(parent_id, name_folded)` (case-sensitive volumes with `Foo.txt` and
+    // `foo.txt` siblings, NFC/NFD duplicates from cross-OS sync, etc.) skips
+    // just that row instead of rolling back the entire 2000-entry batch. The
+    // accumulator must skip those rows too, or `compute_all_aggregates_with_maps`
+    // inflates `dir_stats` with phantom bytes (the constraint comment that
+    // called out "1.83 TB ghost size on a 994 GB volume" is exactly this
+    // failure mode).
     match IndexStore::insert_entries_v2_batch(conn, &entries) {
-        Ok(()) => accumulator.accumulate(&entries),
+        Ok(inserted) => {
+            let skipped_count = inserted.iter().filter(|landed| !**landed).count();
+            if skipped_count == 0 {
+                accumulator.accumulate(&entries);
+            } else {
+                accumulator.accumulate(
+                    entries
+                        .iter()
+                        .zip(inserted.iter())
+                        .filter_map(|(e, landed)| if *landed { Some(e) } else { None }),
+                );
+                let samples: Vec<(i64, &str)> = entries
+                    .iter()
+                    .zip(inserted.iter())
+                    .filter_map(|(e, landed)| {
+                        if !*landed {
+                            Some((e.parent_id, e.name.as_str()))
+                        } else {
+                            None
+                        }
+                    })
+                    .take(3)
+                    .collect();
+                log::warn!(
+                    "Index writer: {skipped_count} of {batch_size} skipped due to UNIQUE conflict on (parent_id, name_folded); sample: {samples:?}",
+                    batch_size = pluralize_with(count as u64, "entry", "entries")
+                );
+            }
+        }
         Err(e) => crate::log_error!("Index writer: insert_entries_v2_batch failed: {e}"),
     }
     let elapsed = t.elapsed().as_millis();
@@ -1642,22 +1675,20 @@ mod tests {
         writer.shutdown();
     }
 
-    // The accumulator must NOT advance when the batch INSERT fails: the
-    // accumulator maps drive `compute_all_aggregates_with_maps`, and counting
-    // bytes for rows that never landed in the DB produces inflated dir_stats.
-    // This was the second mechanism behind the 1.83 TB ghost size on `..` —
-    // the first being two writers racing (Fix #1), the schema admitting
-    // duplicates (Fix #3) finalises it.
+    // The accumulator must only count rows that actually landed in the DB.
+    // `insert_entries_v2_batch` uses `INSERT OR IGNORE`, so one duplicate in
+    // a batch skips just that row and the rest insert. The accumulator maps
+    // drive `compute_all_aggregates_with_maps`; counting bytes for a row that
+    // lost an OR-IGNORE produces inflated dir_stats (this was one of the
+    // mechanisms behind the 1.83 TB ghost size on `..` of a 994 GB volume).
     #[test]
-    fn handle_insert_entries_v2_does_not_accumulate_when_db_insert_fails() {
+    fn handle_insert_entries_v2_only_accumulates_rows_that_landed() {
         use std::sync::atomic::AtomicU64;
 
         let (db_path, _dir) = setup_db();
         let conn = IndexStore::open_write_connection(&db_path).unwrap();
 
-        // Pre-seed a row with id=100 so a second insert with the same id
-        // collides on the integer PK (UNIQUE on entries.id is enforced
-        // independently of the (parent_id, name_folded) index).
+        // Pre-seed: id=100, name="first.txt".
         let entries_first = vec![EntryRow {
             id: 100,
             parent_id: ROOT_ID,
@@ -1671,17 +1702,17 @@ mod tests {
         }];
         IndexStore::insert_entries_v2_batch(&conn, &entries_first).unwrap();
 
-        // Second batch: re-uses id=100. The savepoint rolls back the whole batch
-        // on the PK collision, so neither entry lands in the DB.
+        // Second batch: row 0 collides on the (parent_id, name_folded) UNIQUE
+        // index (same `first.txt` under ROOT_ID). Row 1 is fresh and must land.
         let entries_dup = vec![
             EntryRow {
-                id: 100,
+                id: 200,
                 parent_id: ROOT_ID,
                 name: "first.txt".into(),
                 is_directory: false,
                 is_symlink: false,
-                logical_size: Some(10),
-                physical_size: Some(10),
+                logical_size: Some(999_999),
+                physical_size: Some(999_999),
                 modified_at: None,
                 inode: None,
             },
@@ -1711,23 +1742,29 @@ mod tests {
             &mutation_counter,
         );
 
-        // DB unchanged: only the first.txt row from the seed.
+        // DB has the original first.txt (id=100) and the new second.txt (id=101).
+        // id=200 was the OR-IGNORE'd duplicate and must not be in the DB.
         assert_eq!(
             IndexStore::get_entry_by_id(&conn, 100).unwrap().unwrap().name,
             "first.txt"
         );
-        assert!(IndexStore::get_entry_by_id(&conn, 101).unwrap().is_none());
-
-        // Accumulator must reflect that 0 entries were inserted from this batch.
         assert_eq!(
-            accumulator.entries_inserted, 0,
-            "accumulator advanced despite failed INSERT (would feed inflated dir_stats into aggregation)"
+            IndexStore::get_entry_by_id(&conn, 101).unwrap().unwrap().name,
+            "second.txt"
         );
-        assert!(
-            accumulator.direct_stats.is_empty(),
-            "accumulator.direct_stats populated despite failed INSERT: {:?}",
-            accumulator.direct_stats
+        assert!(IndexStore::get_entry_by_id(&conn, 200).unwrap().is_none());
+
+        // Accumulator must reflect exactly one new entry (the row that landed),
+        // never the 999_999-byte phantom. If a regression makes the accumulator
+        // count the OR-IGNORE'd row, this assert catches it.
+        assert_eq!(
+            accumulator.entries_inserted, 1,
+            "accumulator must count only rows that landed in the DB"
         );
+        let stats = accumulator.direct_stats.get(&ROOT_ID).expect("ROOT_ID stats present");
+        assert_eq!(stats.0, 20, "logical bytes must only count the landed row");
+        assert_eq!(stats.1, 20, "physical bytes must only count the landed row");
+        assert_eq!(stats.2, 1, "file count must only include the landed row");
     }
 
     #[test]