Skip to content

Commit 0236723

Browse files
committed
Bugfix: index can't be corrupted by concurrent start_indexing calls
Three-layer close on the duplicate-rows bug that produced the 1.83 TB ghost size on `~/..` (~1.84x the actual volume usage). #1 — Lock-first `start_indexing`. The previous flow constructed `IndexManager::new` (which spawns the writer thread) *before* acquiring the `INDEXING` lock, then unconditionally wrote `IndexPhase::Initializing { store }` over whatever phase was there. Two near-simultaneous calls both spawned writer threads on the same DB, each with its own `Arc<AtomicI64>` ID counter and `AccumulatorMaps`, racing on inserts. The fix introduces `try_reserve_initializing_phase(store)` which atomically transitions `Disabled -> Initializing(store)` and returns the store back on conflict; `start_indexing` opens just the lightweight init store first, attempts the reservation, and bails out as a no-op if the phase isn't `Disabled`. #2 — `accumulator.accumulate` runs only after the DB INSERT commits. `handle_insert_entries_v2` previously updated `AccumulatorMaps` before calling `insert_entries_v2_batch`. The savepoint rolls back the batch on any conflict, so a failed insert left the accumulator claiming bytes that never reached the DB, and `compute_all_aggregates_with_maps` then wrote inflated `dir_stats`. Two-line reorder: insert first, accumulate on `Ok`. #3 — Schema v12: `UNIQUE (parent_id, name_folded)`. The v5 perf reason for dropping the constraint was the `platform_case` collation B-tree cost; v6 sidestepped that by pre-computing `name_folded` at insert time, so a unique index on the folded column uses binary collation and builds in seconds. The DB now physically rejects the bug class even if the lock-first guard is ever defeated. TDD: each fix lands its red test first, in the same commit-author order as the fixes — `duplicate_parent_name_folded_rejected_individual_insert` / `_batch_insert` (store), `handle_insert_entries_v2_does_not_accumulate_when_db_insert_fails` (writer), `try_reserve_initializing_succeeds_only_from_disabled` (mod.rs). Existing `platform_case_collation_macos` test flipped from "duplicate accepted" to "duplicate rejected" to match the new UNIQUE invariant. All 2119 cmdr Rust tests + 32 integration tests pass; clippy clean. Pre-existing svelte-check failure in `QueryDialog.svelte.test.ts:167` is unrelated and exists on `main` (verified by stashing this work and re-running the check).
1 parent a8096a2 commit 0236723

5 files changed

Lines changed: 278 additions & 25 deletions

File tree

apps/desktop/src-tauri/src/indexing/CLAUDE.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Design history is in git (former `docs/specs/drive-indexing/`).
1515
- **expected_totals.rs** -- `expected_totals_for_sources()` returns the index-derived `(file count, byte total)` for a set of source paths so write operations (copy/move/delete) can render a real scan-phase progress bar before the foolproof re-scan completes. Per source: resolve_path → get_entry_by_id → if dir use `dir_stats`, if file use the entry's `logical_size`. Returns `None` if *any* source isn't covered by the index (no pool, no entry, no `dir_stats`, no `logical_size`). Partial totals would let the progress bar overshoot 100%. Uses the same `ReadPool` as enrichment for lock-free reads. Used by `scan_preview.rs` and `scan.rs` in `write_operations/`.
1616
- **event_loop.rs** -- `run_live_event_loop` (real-time FSEvents/inotify processing after scan completes), `run_replay_event_loop` (cold-start journal replay with two-phase approach), `run_background_verification` (post-replay bidirectional readdir diff), `merge_fs_events` (deduplication with flag priority), `process_live_batch` (three-phase: directory creations first sorted by depth + flush, then `detect_renames_by_inode` rename pre-pass + flush, then remaining events). The rename pre-pass turns `item_renamed` events whose new path's inode already lives in the DB at a different `(parent_id, name)` into a single `MoveEntryV2`, preserving the entry's `dir_stats`. All bounded-buffer constants live here.
1717
- **events.rs** -- Tauri event payload structs (`IndexScanStartedEvent`, `IndexScanProgressEvent`, `IndexScanCompleteEvent`, `IndexDirUpdatedEvent`, `IndexReplayProgressEvent`, `IndexReplayCompleteEvent`), `RescanReason` enum, `emit_rescan_notification()`, IPC response types (`IndexStatusResponse`, `IndexDebugStatusResponse`). Also: `ActivityPhase` enum (Replaying/Scanning/Aggregating/Reconciling/Live/Idle), `PhaseRecord` for the phase timeline, and `DebugStats` (shared atomic counters for the debug window + phase timeline via `set_phase()`/`close_phase_with_stats()`).
18-
- **store.rs** -- SQLite schema v9 (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. v7 added dual sizes (logical + physical). v8 adds `inode INTEGER` column and `idx_inode` index for hardlink dedup at write time. v9 unifies `name_folded` across all platforms (eliminates platform-conditional SQL). `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
18+
- **store.rs** -- SQLite schema v12 (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite **UNIQUE** index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. v7 added dual sizes (logical + physical). v8 adds `inode INTEGER` column and `idx_inode` index for hardlink dedup at write time. v9 unifies `name_folded` across all platforms (eliminates platform-conditional SQL). v12 reinstates `UNIQUE` on `(parent_id, name_folded)` so the DB physically rejects duplicate rows (the v5 perf reason no longer applies since v6 pre-computes `name_folded`). `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
1919
- **metadata.rs** -- `MetadataSnapshot` struct and `extract_metadata()` function. Single location for all platform-specific metadata extraction (logical/physical size, mtime, inode, nlink). Used by scanner, reconciler, verifier, and event_loop. Symlinks get `None` everywhere. Files get sizes + inode + nlink. Directories get inode but no sizes/nlink. The inode is what the live event loop's rename pre-pass matches against to detect dir renames in place.
2020
- **memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
2121
- **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). `WRITER_GENERATION: AtomicU64` (initialized to 1) bumped on every mutation (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `TruncateData`) for search index staleness detection. Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. `MoveEntryV2 { entry_id, new_parent_id, new_name }` updates an entry's `(parent_id, name, name_folded)` in place, preserving its `id` and (for directories) `dir_stats`. Same-parent renames don't change ancestor totals; cross-parent moves subtract the entry's contribution from the old ancestor chain and add it to the new one (and recompute the OR-aggregated `recursive_has_symlinks` flag on both chains). The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` auto-propagates deltas on both insert and update: on insert, propagates the full size (+file_count or +dir_count); on update, reads the old entry first and propagates only the size difference. This means callers never need a separate `PropagateDeltaById` for upserted entries. For new directories, also initializes a zero-valued `dir_stats` row so enrichment always has a row. Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle). No index drop/recreate dance. The `idx_parent_name_folded` composite index uses binary collation and stays present during scans.
@@ -81,15 +81,15 @@ All writes go through a dedicated `std::thread` via a bounded `sync_channel` (20
8181

8282
Reads happen on separate WAL connections (any thread). A `ReadPool` provides thread-local read connections for enrichment and verification without contending on the `INDEXING` state-machine mutex.
8383

84-
### SQLite schema (v11: integer-keyed, unified `name_folded` column, inode for hardlink dedup + dir rename detection, `recursive_has_symlinks` flag)
84+
### SQLite schema (v12: integer-keyed, unified `name_folded` column, inode for hardlink dedup + dir rename detection, `recursive_has_symlinks` flag, UNIQUE `(parent_id, name_folded)`)
8585

8686
One DB per volume. **Dev and prod use separate directories** (see AGENTS.md § Debugging):
8787
- **Prod**: `~/Library/Application Support/com.veszelovszki.cmdr/index-{volume_id}.db`
8888
- **Dev**: `~/Library/Application Support/com.veszelovszki.cmdr-dev/index-{volume_id}.db`
8989

9090
Three tables:
9191
- `entries` (id INTEGER PK, parent_id, name COLLATE platform_case, name_folded, is_directory, is_symlink, logical_size, physical_size, modified_at, inode). Root sentinel: id=1, parent_id=0, name="".
92-
- `name_folded TEXT NOT NULL` stores `normalize_for_comparison(name)` on all platforms. On macOS this is NFD + case fold; on Linux/Windows it's the identity (same as `name`). Index: `idx_parent_name_folded ON entries (parent_id, name_folded)`.
92+
- `name_folded TEXT NOT NULL` stores `normalize_for_comparison(name)` on all platforms. On macOS this is NFD + case fold; on Linux/Windows it's the identity (same as `name`). Index: `idx_parent_name_folded ON entries (parent_id, name_folded)` (UNIQUE since v12).
9393
- The old `idx_parent(parent_id)` from v5 is removed; the composite index replaces it.
9494
- `dir_stats` (entry_id INTEGER PK, recursive_logical_size, recursive_physical_size, recursive_file_count, recursive_dir_count, recursive_has_symlinks)
9595
- `meta` (key TEXT PK, value TEXT) WITHOUT ROWID
@@ -106,6 +106,7 @@ History of changes:
106106
- **Schema v9**: Unified `name_folded` column and `idx_parent_name_folded` index across all platforms (previously macOS only). On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name`, one extra column with no behavioral change. Eliminates 12 `#[cfg(target_os)]` blocks from store.rs SQL.
107107
- **Schema v10**: Added `recursive_has_symlinks INTEGER NOT NULL DEFAULT 0` to `dir_stats`. OR-aggregated bottom-up: a directory's flag is `true` iff any direct child is a symlink OR any child directory's stored flag is `true`. Surfaced via `DirStats.recursive_has_symlinks` (IPC) and `FileEntry.recursive_has_symlinks` (enrichment). The frontend renders an inline `(i)` icon next to a folder's size in `SelectionInfo` to explain why a symlink-only folder shows 0 bytes (we follow `du`/Finder's behavior of not double-counting symlinked content). Schema bump triggers a full re-scan on next launch; no online migration. The index is a disposable cache.
108108
- **Schema v11**: No SQL change. Bumped to force a rebuild so the scanner repopulates `entries.inode` for directories (used to be `None`; see `metadata.rs`). Required for inode-based rename detection in the live event loop's `detect_renames_by_inode` pre-pass to actually do anything on existing user databases; without the bump, every directory in a pre-v11 DB would have `inode IS NULL` and `find_entry_by_inode` would never match.
109+
- **Schema v12**: Reinstated `UNIQUE` on `(parent_id, name_folded)` (the constraint v5 dropped). v5's perf reason was the `platform_case` collation B-tree cost; v6 sidestepped that by pre-computing `name_folded` at insert time, so the unique index now uses binary collation and builds in seconds. Without the constraint, two concurrent `IndexWriter` instances racing on the same DB (each with its own `Arc<AtomicI64>` ID counter) could insert distinct rows for the same `(parent_id, name)`, which the aggregator then double-counted into `dir_stats` — observed as a 1.83 TB ghost size on the `..` row of a 994 GB volume. The state-machine fix (lock-first `start_indexing`) closes the trigger; this schema constraint is the safety net.
109110

110111
## How to test
111112

@@ -169,6 +170,10 @@ function is shared so the two gate sites can't drift apart.
169170

170171
**Single-writer thread, not connection pooling**: SQLite write concurrency is limited by its single-writer design. Instead of fighting it with `BUSY_TIMEOUT` and retries, one dedicated thread owns the write connection. Eliminates contention entirely.
171172

173+
**Lock-first `start_indexing`, atomic phase reservation**: `start_indexing` opens a temporary `IndexStore` and atomically claims the `Disabled -> Initializing(store)` transition via `try_reserve_initializing_phase()` BEFORE constructing the heavy `IndexManager` (which spawns the writer thread). If the phase is already `Initializing`, `Running`, or `ShuttingDown`, the call is a no-op. The previous design did the inline assignment `*guard = Initializing { store }` AFTER `IndexManager::new`, so two near-simultaneous calls both spawned writer threads — each with its own `Arc<AtomicI64>` ID counter and `AccumulatorMaps` — racing on the same DB. The lock-first guard makes second-call no-op behaviour the contract; the v12 UNIQUE index is the safety net if the contract is ever bypassed.
174+
175+
**`accumulator.accumulate` runs only after the DB commit succeeds**: `handle_insert_entries_v2` previously updated `AccumulatorMaps` before calling `IndexStore::insert_entries_v2_batch`. The batch is wrapped in a SQLite savepoint that rolls back on any conflict (now including UNIQUE on `(parent_id, name_folded)`), so a failed batch left the accumulator claiming bytes that never reached the DB; `compute_all_aggregates_with_maps` then wrote inflated `dir_stats`. The fix is a two-line reorder: insert first, accumulate on `Ok`. With the lock-first guard preventing duplicate writers, the failure path is rare in practice, but the contract — "in-memory state never claims more than the DB has" — should hold under any future failure mode.
176+
172177
**Index enrichment at read time, not cache time**: `recursive_size` fields are populated on every `get_file_range` call via a batch SQLite read from `dir_stats`. This avoids stale data and keeps enrichment consistent with the latest DB state. The cost is microseconds per page on a WAL connection.
173178

174179
**Enrichment uses integer-keyed batch lookup**: Instead of N individual `resolve_path()` calls (one per directory in the listing), `enrich_entries_with_index` resolves the parent directory once, queries `list_child_dir_ids_and_names(parent_id)` for all child dir IDs, then `get_dir_stats_batch_by_ids()`. Two indexed queries total instead of N. Falls back to individual path resolution for edge cases (for example, mixed-parent entries).
@@ -262,7 +267,7 @@ per-event detail back: `RUST_LOG=cmdr_lib::indexing::reconciler=trace,cmdr_lib::
262267

263268
**Reconciler must delete old subtree on dir-to-file type changes**: When `reconcile_subtree` matches a filesystem entry to a DB entry by name, it must check if `is_directory` changed. If a directory became a file, `DeleteSubtreeById` must be sent before `UpsertEntryV2`. Without this, `INSERT OR REPLACE` keeps the same row ID (same `parent_id + name`), and the old directory's children become logical orphans: entries parented by a file.
264269

265-
**Scanner's `insert_entries_v2_batch` uses plain `INSERT`**: With the old `idx_parent_name` unique index, `INSERT OR REPLACE` would silently delete the old row and insert a new one with a new ID, orphaning all children. That unique index is gone (replaced by `idx_parent_name_folded` on macOS / `idx_parent_name` on Linux), and the only unique constraint is the integer PK (`id`). IDs are allocated from a shared `Arc<AtomicI64>` counter owned by `IndexWriter`, and the table is truncated before full scans (or descendants deleted before subtree scans), so PK conflicts shouldn't occur. The batch insert uses plain `INSERT` to reflect this.
270+
**Scanner's `insert_entries_v2_batch` uses plain `INSERT`**: `INSERT OR REPLACE` would silently delete the old row and insert a new one with a new ID, orphaning all children. The table has two unique constraints: PK on `id`, and (since v12) the composite UNIQUE `(parent_id, name_folded)`. IDs are allocated from a shared `Arc<AtomicI64>` counter owned by `IndexWriter`, and the table is truncated before full scans (or descendants deleted before subtree scans), so under a single live writer neither conflict can occur. The plain `INSERT` reflects the invariant. If a conflict ever does fire (e.g. someone defeats the lock-first guard), the batch savepoint rolls back and `handle_insert_entries_v2` skips the accumulator update, so the in-memory aggregation state stays consistent with the DB.
266271

267272
**IndexWriter owns the shared `next_id` counter**: All ID allocation goes through an `Arc<AtomicI64>` owned by `IndexWriter`. The scanner's `ScanContext` atomically increments it via `alloc_id()`, and the writer thread bumps it via `fetch_max` after `UpsertEntryV2` inserts (which let SQLite auto-assign). `TruncateData` resets it to 2. This eliminates the race where `ScanContext` previously read a stale `MAX(id)` from a read connection while the writer had uncommitted inserts in its channel. `IndexWriter` still exposes `db_path()`. The scanner opens a temporary connection for `resolve_path` (subtree scans) and `ensure_root_sentinel` (volume scans), but no longer for ID allocation.
268273

0 commit comments

Comments
 (0)