You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bugfix: index can't be corrupted by concurrent start_indexing calls
Three-layer close on the duplicate-rows bug that produced the 1.83 TB ghost size on `~/..` (~1.84x the actual volume usage).
#1 — Lock-first `start_indexing`. The previous flow constructed `IndexManager::new` (which spawns the writer thread) *before* acquiring the `INDEXING` lock, then unconditionally wrote `IndexPhase::Initializing { store }` over whatever phase was there. Two near-simultaneous calls both spawned writer threads on the same DB, each with its own `Arc<AtomicI64>` ID counter and `AccumulatorMaps`, racing on inserts. The fix introduces `try_reserve_initializing_phase(store)` which atomically transitions `Disabled -> Initializing(store)` and returns the store back on conflict; `start_indexing` opens just the lightweight init store first, attempts the reservation, and bails out as a no-op if the phase isn't `Disabled`.
#2 — `accumulator.accumulate` runs only after the DB INSERT commits. `handle_insert_entries_v2` previously updated `AccumulatorMaps` before calling `insert_entries_v2_batch`. The savepoint rolls back the batch on any conflict, so a failed insert left the accumulator claiming bytes that never reached the DB, and `compute_all_aggregates_with_maps` then wrote inflated `dir_stats`. Two-line reorder: insert first, accumulate on `Ok`.
#3 — Schema v12: `UNIQUE (parent_id, name_folded)`. The v5 perf reason for dropping the constraint was the `platform_case` collation B-tree cost; v6 sidestepped that by pre-computing `name_folded` at insert time, so a unique index on the folded column uses binary collation and builds in seconds. The DB now physically rejects the bug class even if the lock-first guard is ever defeated.
TDD: each fix lands its red test first, in the same commit-author order as the fixes — `duplicate_parent_name_folded_rejected_individual_insert` / `_batch_insert` (store), `handle_insert_entries_v2_does_not_accumulate_when_db_insert_fails` (writer), `try_reserve_initializing_succeeds_only_from_disabled` (mod.rs). Existing `platform_case_collation_macos` test flipped from "duplicate accepted" to "duplicate rejected" to match the new UNIQUE invariant. All 2119 cmdr Rust tests + 32 integration tests pass; clippy clean.
Pre-existing svelte-check failure in `QueryDialog.svelte.test.ts:167` is unrelated and exists on `main` (verified by stashing this work and re-running the check).
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/indexing/CLAUDE.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ Design history is in git (former `docs/specs/drive-indexing/`).
15
15
-**expected_totals.rs** -- `expected_totals_for_sources()` returns the index-derived `(file count, byte total)` for a set of source paths so write operations (copy/move/delete) can render a real scan-phase progress bar before the foolproof re-scan completes. Per source: resolve_path → get_entry_by_id → if dir use `dir_stats`, if file use the entry's `logical_size`. Returns `None` if *any* source isn't covered by the index (no pool, no entry, no `dir_stats`, no `logical_size`). Partial totals would let the progress bar overshoot 100%. Uses the same `ReadPool` as enrichment for lock-free reads. Used by `scan_preview.rs` and `scan.rs` in `write_operations/`.
16
16
-**event_loop.rs** -- `run_live_event_loop` (real-time FSEvents/inotify processing after scan completes), `run_replay_event_loop` (cold-start journal replay with two-phase approach), `run_background_verification` (post-replay bidirectional readdir diff), `merge_fs_events` (deduplication with flag priority), `process_live_batch` (three-phase: directory creations first sorted by depth + flush, then `detect_renames_by_inode` rename pre-pass + flush, then remaining events). The rename pre-pass turns `item_renamed` events whose new path's inode already lives in the DB at a different `(parent_id, name)` into a single `MoveEntryV2`, preserving the entry's `dir_stats`. All bounded-buffer constants live here.
17
17
-**events.rs** -- Tauri event payload structs (`IndexScanStartedEvent`, `IndexScanProgressEvent`, `IndexScanCompleteEvent`, `IndexDirUpdatedEvent`, `IndexReplayProgressEvent`, `IndexReplayCompleteEvent`), `RescanReason` enum, `emit_rescan_notification()`, IPC response types (`IndexStatusResponse`, `IndexDebugStatusResponse`). Also: `ActivityPhase` enum (Replaying/Scanning/Aggregating/Reconciling/Live/Idle), `PhaseRecord` for the phase timeline, and `DebugStats` (shared atomic counters for the debug window + phase timeline via `set_phase()`/`close_phase_with_stats()`).
18
-
- **store.rs** -- SQLite schema v9 (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. v7 added dual sizes (logical + physical). v8 adds `inode INTEGER` column and `idx_inode` index for hardlink dedup at write time. v9 unifies `name_folded` across all platforms (eliminates platform-conditional SQL). `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
18
+
- **store.rs** -- SQLite schema v12 (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, dir_stats by entry_id, meta), platform_case collation, read queries, DB open/migrate. `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite **UNIQUE** index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. v7 added dual sizes (logical + physical). v8 adds `inode INTEGER` column and `idx_inode` index for hardlink dedup at write time. v9 unifies `name_folded` across all platforms (eliminates platform-conditional SQL). v12 reinstates `UNIQUE` on `(parent_id, name_folded)` so the DB physically rejects duplicate rows (the v5 perf reason no longer applies since v6 pre-computes `name_folded`). `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
19
19
-**metadata.rs** -- `MetadataSnapshot` struct and `extract_metadata()` function. Single location for all platform-specific metadata extraction (logical/physical size, mtime, inode, nlink). Used by scanner, reconciler, verifier, and event_loop. Symlinks get `None` everywhere. Files get sizes + inode + nlink. Directories get inode but no sizes/nlink. The inode is what the live event loop's rename pre-pass matches against to detect dir renames in place.
20
20
-**memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
21
21
- **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). `WRITER_GENERATION: AtomicU64` (initialized to 1) bumped on every mutation (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `TruncateData`) for search index staleness detection. Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. `MoveEntryV2 { entry_id, new_parent_id, new_name }` updates an entry's `(parent_id, name, name_folded)` in place, preserving its `id` and (for directories) `dir_stats`. Same-parent renames don't change ancestor totals; cross-parent moves subtract the entry's contribution from the old ancestor chain and add it to the new one (and recompute the OR-aggregated `recursive_has_symlinks` flag on both chains). The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` auto-propagates deltas on both insert and update: on insert, propagates the full size (+file_count or +dir_count); on update, reads the old entry first and propagates only the size difference. This means callers never need a separate `PropagateDeltaById` for upserted entries. For new directories, also initializes a zero-valued `dir_stats` row so enrichment always has a row. Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle). No index drop/recreate dance. The `idx_parent_name_folded` composite index uses binary collation and stays present during scans.
@@ -81,15 +81,15 @@ All writes go through a dedicated `std::thread` via a bounded `sync_channel` (20
81
81
82
82
Reads happen on separate WAL connections (any thread). A `ReadPool` provides thread-local read connections for enrichment and verification without contending on the `INDEXING` state-machine mutex.
83
83
84
-
### SQLite schema (v11: integer-keyed, unified `name_folded` column, inode for hardlink dedup + dir rename detection, `recursive_has_symlinks` flag)
-`entries` (id INTEGER PK, parent_id, name COLLATE platform_case, name_folded, is_directory, is_symlink, logical_size, physical_size, modified_at, inode). Root sentinel: id=1, parent_id=0, name="".
92
-
-`name_folded TEXT NOT NULL` stores `normalize_for_comparison(name)` on all platforms. On macOS this is NFD + case fold; on Linux/Windows it's the identity (same as `name`). Index: `idx_parent_name_folded ON entries (parent_id, name_folded)`.
92
+
-`name_folded TEXT NOT NULL` stores `normalize_for_comparison(name)` on all platforms. On macOS this is NFD + case fold; on Linux/Windows it's the identity (same as `name`). Index: `idx_parent_name_folded ON entries (parent_id, name_folded)` (UNIQUE since v12).
93
93
- The old `idx_parent(parent_id)` from v5 is removed; the composite index replaces it.
-**Schema v9**: Unified `name_folded` column and `idx_parent_name_folded` index across all platforms (previously macOS only). On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name`, one extra column with no behavioral change. Eliminates 12 `#[cfg(target_os)]` blocks from store.rs SQL.
107
107
-**Schema v10**: Added `recursive_has_symlinks INTEGER NOT NULL DEFAULT 0` to `dir_stats`. OR-aggregated bottom-up: a directory's flag is `true` iff any direct child is a symlink OR any child directory's stored flag is `true`. Surfaced via `DirStats.recursive_has_symlinks` (IPC) and `FileEntry.recursive_has_symlinks` (enrichment). The frontend renders an inline `(i)` icon next to a folder's size in `SelectionInfo` to explain why a symlink-only folder shows 0 bytes (we follow `du`/Finder's behavior of not double-counting symlinked content). Schema bump triggers a full re-scan on next launch; no online migration. The index is a disposable cache.
108
108
-**Schema v11**: No SQL change. Bumped to force a rebuild so the scanner repopulates `entries.inode` for directories (used to be `None`; see `metadata.rs`). Required for inode-based rename detection in the live event loop's `detect_renames_by_inode` pre-pass to actually do anything on existing user databases; without the bump, every directory in a pre-v11 DB would have `inode IS NULL` and `find_entry_by_inode` would never match.
109
+
-**Schema v12**: Reinstated `UNIQUE` on `(parent_id, name_folded)` (the constraint v5 dropped). v5's perf reason was the `platform_case` collation B-tree cost; v6 sidestepped that by pre-computing `name_folded` at insert time, so the unique index now uses binary collation and builds in seconds. Without the constraint, two concurrent `IndexWriter` instances racing on the same DB (each with its own `Arc<AtomicI64>` ID counter) could insert distinct rows for the same `(parent_id, name)`, which the aggregator then double-counted into `dir_stats` — observed as a 1.83 TB ghost size on the `..` row of a 994 GB volume. The state-machine fix (lock-first `start_indexing`) closes the trigger; this schema constraint is the safety net.
109
110
110
111
## How to test
111
112
@@ -169,6 +170,10 @@ function is shared so the two gate sites can't drift apart.
169
170
170
171
**Single-writer thread, not connection pooling**: SQLite write concurrency is limited by its single-writer design. Instead of fighting it with `BUSY_TIMEOUT` and retries, one dedicated thread owns the write connection. Eliminates contention entirely.
171
172
173
+
**Lock-first `start_indexing`, atomic phase reservation**: `start_indexing` opens a temporary `IndexStore` and atomically claims the `Disabled -> Initializing(store)` transition via `try_reserve_initializing_phase()` BEFORE constructing the heavy `IndexManager` (which spawns the writer thread). If the phase is already `Initializing`, `Running`, or `ShuttingDown`, the call is a no-op. The previous design did the inline assignment `*guard = Initializing { store }` AFTER `IndexManager::new`, so two near-simultaneous calls both spawned writer threads — each with its own `Arc<AtomicI64>` ID counter and `AccumulatorMaps` — racing on the same DB. The lock-first guard makes second-call no-op behaviour the contract; the v12 UNIQUE index is the safety net if the contract is ever bypassed.
174
+
175
+
**`accumulator.accumulate` runs only after the DB commit succeeds**: `handle_insert_entries_v2` previously updated `AccumulatorMaps` before calling `IndexStore::insert_entries_v2_batch`. The batch is wrapped in a SQLite savepoint that rolls back on any conflict (now including UNIQUE on `(parent_id, name_folded)`), so a failed batch left the accumulator claiming bytes that never reached the DB; `compute_all_aggregates_with_maps` then wrote inflated `dir_stats`. The fix is a two-line reorder: insert first, accumulate on `Ok`. With the lock-first guard preventing duplicate writers, the failure path is rare in practice, but the contract — "in-memory state never claims more than the DB has" — should hold under any future failure mode.
176
+
172
177
**Index enrichment at read time, not cache time**: `recursive_size` fields are populated on every `get_file_range` call via a batch SQLite read from `dir_stats`. This avoids stale data and keeps enrichment consistent with the latest DB state. The cost is microseconds per page on a WAL connection.
173
178
174
179
**Enrichment uses integer-keyed batch lookup**: Instead of N individual `resolve_path()` calls (one per directory in the listing), `enrich_entries_with_index` resolves the parent directory once, queries `list_child_dir_ids_and_names(parent_id)` for all child dir IDs, then `get_dir_stats_batch_by_ids()`. Two indexed queries total instead of N. Falls back to individual path resolution for edge cases (for example, mixed-parent entries).
**Reconciler must delete old subtree on dir-to-file type changes**: When `reconcile_subtree` matches a filesystem entry to a DB entry by name, it must check if `is_directory` changed. If a directory became a file, `DeleteSubtreeById` must be sent before `UpsertEntryV2`. Without this, `INSERT OR REPLACE` keeps the same row ID (same `parent_id + name`), and the old directory's children become logical orphans: entries parented by a file.
264
269
265
-
**Scanner's `insert_entries_v2_batch` uses plain `INSERT`**: With the old `idx_parent_name` unique index, `INSERT OR REPLACE` would silently delete the old row and insert a new one with a new ID, orphaning all children. That unique index is gone (replaced by `idx_parent_name_folded`on macOS / `idx_parent_name` on Linux), and the only unique constraint is the integer PK (`id`). IDs are allocated from a shared `Arc<AtomicI64>` counter owned by `IndexWriter`, and the table is truncated before full scans (or descendants deleted before subtree scans), so PK conflicts shouldn't occur. The batch insert uses plain `INSERT` to reflect this.
270
+
**Scanner's `insert_entries_v2_batch` uses plain `INSERT`**: `INSERT OR REPLACE` would silently delete the old row and insert a new one with a new ID, orphaning all children. The table has two unique constraints: PK on `id`, and (since v12) the composite UNIQUE `(parent_id, name_folded)`. IDs are allocated from a shared `Arc<AtomicI64>` counter owned by `IndexWriter`, and the table is truncated before full scans (or descendants deleted before subtree scans), so under a single live writer neither conflict can occur. The plain `INSERT` reflects the invariant. If a conflict ever does fire (e.g. someone defeats the lock-first guard), the batch savepoint rolls back and `handle_insert_entries_v2` skips the accumulator update, so the in-memory aggregation state stays consistent with the DB.
266
271
267
272
**IndexWriter owns the shared `next_id` counter**: All ID allocation goes through an `Arc<AtomicI64>` owned by `IndexWriter`. The scanner's `ScanContext` atomically increments it via `alloc_id()`, and the writer thread bumps it via `fetch_max` after `UpsertEntryV2` inserts (which let SQLite auto-assign). `TruncateData` resets it to 2. This eliminates the race where `ScanContext` previously read a stale `MAX(id)` from a read connection while the writer had uncommitted inserts in its channel. `IndexWriter` still exposes `db_path()`. The scanner opens a temporary connection for `resolve_path` (subtree scans) and `ensure_root_sentinel` (volume scans), but no longer for ID allocation.
0 commit comments