You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bugfix: index rename no longer fails when the destination name is already taken
`MoveEntryV2` used to run a bare UPDATE that hit the UNIQUE `(parent_id, name_folded)` constraint whenever a different row already occupied the destination (rename-with-overwrite, or a concurrent upsert racing ahead of the move message). The moved entry then stayed stuck at its old location, the same id failed repeatedly with `MoveEntryV2 update failed ... UNIQUE constraint failed` warns, and only per-navigation verification eventually healed the drift.
- `handle_move_entry_v2` now resolves the destination first and, when a different entry owns it, deletes that row (subtree-aware, with full delta propagation via the existing `handle_delete_entry_by_id` / `handle_delete_subtree_by_id`) before moving. The on-disk truth after a rename is that the moved entry owns the destination name, so this preserves the moved entry's `id` and `dir_stats` instead of waiting for the verifier.
- TDD'd: `move_entry_v2_destination_collision_replaces_conflicting_file` and `..._replaces_conflicting_dir_subtree` reproduce the collision and pin the replace semantics.
- Also demoted the `stall_probe::sqlite_busy` busy-handler logging: per-attempt lines go to debug (brief contention from WAL checkpoints and long-lived readers is routine and was the noisiest warn in a normal dev session); attempt >= 20 (>100ms lock wait, a genuine stall signal) stays warn.
Copy file name to clipboardExpand all lines: apps/desktop/src-tauri/src/indexing/CLAUDE.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ Design history is in git (former `docs/specs/drive-indexing/`).
20
20
-**store.rs** -- SQLite schema (integer-keyed entries with `name_folded` column on all platforms, `inode` column for hardlink dedup, `dir_stats` by entry_id, `meta`), `platform_case` collation, read queries, DB open/migrate. `resolve_component` always queries by `(parent_id, name_folded)` using the `idx_parent_name_folded` composite **UNIQUE** index. On Linux/Windows, `normalize_for_comparison()` is the identity function, so `name_folded = name` and the index behaves identically to a `(parent_id, name)` index. Schema version check: mismatch triggers drop+rebuild. `has_sized_entry_for_inode()` checks if another entry with the same inode already has non-NULL sizes; `find_entry_by_inode()` returns the first row with a given inode (used by the live event loop's rename pre-pass). Both path-keyed (backward compat) and integer-keyed APIs.
21
21
-**metadata.rs** -- `MetadataSnapshot` struct and `extract_metadata()` function. Single location for all platform-specific metadata extraction (logical/physical size, mtime, inode, nlink). Used by scanner, reconciler, verifier, and event_loop. Symlinks get `None` everywhere. Files get sizes + inode + nlink. Directories get inode but no sizes/nlink. The inode is what the live event loop's rename pre-pass matches against to detect dir renames in place.
22
22
-**memory_watchdog.rs** -- Background task monitoring resident memory via `mach_task_info` (macOS). Warns at 8 GB, stops indexing at 16 GB, emits `index-memory-warning` event to frontend. No-op stub on non-macOS. Started from `start_indexing()`.
23
-
- **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). `WRITER_GENERATION: AtomicU64` (initialized to 1) bumped on every mutation (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `TruncateData`) for search index staleness detection. Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. `MoveEntryV2 { entry_id, new_parent_id, new_name }` updates an entry's `(parent_id, name, name_folded)` in place, preserving its `id` and (for directories) `dir_stats`. Same-parent renames don't change ancestor totals; cross-parent moves subtract the entry's contribution from the old ancestor chain and add it to the new one (and recompute the OR-aggregated `recursive_has_symlinks` flag on both chains). The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` auto-propagates deltas on both insert and update: on insert, propagates the full size (+file_count or +dir_count); on update, reads the old entry first and propagates only the size difference. This means callers never need a separate `PropagateDeltaById` for upserted entries. For new directories, also initializes a zero-valued `dir_stats` row so enrichment always has a row. Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. On `ComputePartialAggregates { hot_paths }` (mid-scan), `handle_compute_partial_aggregates` borrows the same maps **read-only** (no clear, no mutation, no generation bump), no-ops on empty maps with no SQL fallback, delegates the math to `aggregator::compute_partial_aggregates`, writes a depth-capped (`PARTIAL_AGG_MAX_DEPTH = 3`) subset of `dir_stats` rows, and emits `index-dir-updated { paths: ["/"] }` when an `AppHandle` is present. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). `IndexWriter::try_send` is a non-blocking send (`Ok(true)` enqueued / `Ok(false)` channel full, dropped / `Err` writer gone) with `queue_depth()` accessor over the channel-depth atomic; the bump/undo accounting lives in the extracted `try_send_with_depth` free function (undoes the bump on both `Full` and `Disconnected` so the depth never drifts). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle). No index drop/recreate dance. The `idx_parent_name_folded` composite index uses binary collation and stays present during scans.
23
+
- **writer.rs** -- Single writer thread, owns the write connection, processes `WriteMessage` channel (bounded `sync_channel`, 20K capacity, backpressure via blocking). `WRITER_GENERATION: AtomicU64` (initialized to 1) bumped on every mutation (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `TruncateData`) for search index staleness detection. Priority: `UpdateDirStats` before `InsertEntries`. `Flush` variant + async `flush()` method let callers wait for all prior writes to commit. Has both integer-keyed variants (`InsertEntriesV2`, `UpsertEntryV2`, `MoveEntryV2`, `DeleteEntryById`, `DeleteSubtreeById`, `PropagateDeltaById`) and path-keyed backward-compat variants. `MoveEntryV2 { entry_id, new_parent_id, new_name }` updates an entry's `(parent_id, name, name_folded)` in place, preserving its `id` and (for directories) `dir_stats`. If a different entry already occupies the destination `(parent_id, name_folded)` (rename-with-overwrite, or a concurrent upsert racing ahead of the move message), the handler deletes the conflicting row first (subtree-aware, with delta propagation) so the move never fails the UNIQUE constraint; the on-disk truth after a rename is that the moved entry owns the destination name. Same-parent renames don't change ancestor totals; cross-parent moves subtract the entry's contribution from the old ancestor chain and add it to the new one (and recompute the OR-aggregated `recursive_has_symlinks` flag on both chains). The integer-keyed delete/subtree-delete handlers auto-propagate negative deltas via the `parent_id` chain (same pattern as the path-keyed variants). `propagate_delta_by_id` walks the parent chain using `get_parent_id` lookups. `UpsertEntryV2` auto-propagates deltas on both insert and update: on insert, propagates the full size (+file_count or +dir_count); on update, reads the old entry first and propagates only the size difference. This means callers never need a separate `PropagateDeltaById` for upserted entries. For new directories, also initializes a zero-valued `dir_stats` row so enrichment always has a row. Maintains `AccumulatorMaps` during `InsertEntriesV2` processing (two HashMaps: direct children stats and child dir relationships + an `entries_inserted` counter), cleared on `TruncateData`. On `ComputeAllAggregates`, passes accumulated maps to `aggregator::compute_all_aggregates_with_maps()` to skip expensive full-table-scan SQL queries. On `ComputePartialAggregates { hot_paths }` (mid-scan), `handle_compute_partial_aggregates` borrows the same maps **read-only** (no clear, no mutation, no generation bump), no-ops on empty maps with no SQL fallback, delegates the math to `aggregator::compute_partial_aggregates`, writes a depth-capped (`PARTIAL_AGG_MAX_DEPTH = 3`) subset of `dir_stats` rows, and emits `index-dir-updated { paths: ["/"] }` when an `AppHandle` is present. Accepts an optional `AppHandle` at spawn time to emit `index-aggregation-progress` events during aggregation (phase, current, total). `IndexWriter::try_send` is a non-blocking send (`Ok(true)` enqueued / `Ok(false)` channel full, dropped / `Err` writer gone) with `queue_depth()` accessor over the channel-depth atomic; the bump/undo accounting lives in the extracted `try_send_with_depth` free function (undoes the bump on both `Full` and `Disconnected` so the depth never drifts). Also emits `saving_entries` phase progress during `InsertEntriesV2` processing when the expected total is set via `set_expected_total_entries()` (an `Arc<AtomicU64>` shared between the writer thread and the `IndexWriter` handle). No index drop/recreate dance. The `idx_parent_name_folded` composite index uses binary collation and stays present during scans.
24
24
- **scanner.rs** -- jwalk-based parallel directory walker. `scan_volume()` for full scan, `scan_subtree()` for targeted subtree rescans (used by post-replay background verification). Uses `ScanContext` (from store.rs) to assign integer IDs and parent IDs during the walk: maintains a `HashMap<PathBuf, i64>` mapping directory paths to assigned IDs, with IDs allocated from the shared `Arc<AtomicI64>` counter owned by `IndexWriter`. The scan root is mapped to `ROOT_ID` (1). Sends `InsertEntriesV2(Vec<EntryRow>)` batches to the writer. Platform-specific exclusion filters via `should_exclude` (`pub(super)`), the single exclusion gate for all code paths (scanner, reconciler, event_loop verification, per-navigation verifier). E2E scan restriction: when `CMDR_E2E_START_PATH` is set, `should_exclude` restricts scanning to only the fixture path, its children, and ancestors. Everything else is excluded (critical for Docker E2E performance). `default_exclusions()` is `#[cfg(test)]` only. Physical sizes (`st_blocks * 512`). Hardlink inode dedup: files with `nlink > 1` are tracked in a `HashSet<u64>` by inode; only the first link's size is counted, subsequent links get `size = None`. Files with `nlink == 1` (vast majority) skip the set entirely. All files store `inode` in `EntryRow.inode` (from `MetadataExt::ino()` on Unix, `None` on non-Unix). Directories and symlinks get `inode: None`.
25
25
- **aggregator.rs** -- Dir stats computation. Bottom-up after full scan (O(N) single pass), per-subtree after subtree rescans, incremental delta propagation up ancestor chain for watcher events. Two entry points for full aggregation: `compute_all_aggregates_reported` (loads maps from SQL) and `compute_all_aggregates_with_maps` (accepts pre-built maps from the writer). Both accept an `on_progress: &mut dyn FnMut(AggregationProgress)` callback and delegate to `compute_and_write()` for the shared topological sort + bottom-up computation + batch write. Progress is reported at phase transitions and every ~1% during compute/write loops. `AggregationPhase` enum: `SavingEntries` (flushing writer channel), `LoadingDirectories`, `Sorting`, `Computing`, `Writing`. The composite indexes use binary collation so there's no per-scan index rebuild phase. `compute_partial_aggregates` is the mid-scan variant: it derives the dir list and parent relations from the borrowed accumulator maps (no SQL `load_all_directory_ids` scan), computes each dir's depth from the scan root via a memoized walk (`depth(ROOT_ID) = 0` is the explicit base case; unreachable dirs get `usize::MAX` so the depth cap never writes them), reuses the same `topological_sort_bottom_up` + `compute_bottom_up` over **all** dirs, and writes only dirs at `depth ≤ max_depth` plus each resolvable hot-path dir and its direct children. `backfill_missing_dir_stats` is a catch-up pass that finds directories without `dir_stats` rows and computes their stats bottom-up; triggered after reconciler replay and cold-start replay via `BackfillMissingDirStats` writer message.
26
26
-**watcher.rs** -- Drive-level filesystem watcher. macOS: FSEvents via `cmdr-fsevent-stream` with event IDs and `sinceWhen` replay. Linux: `notify` crate (inotify backend) with recursive watching and synthetic event counter. Other platforms: stub. `supports_event_replay()` lets callers branch on whether journal replay is available.
@@ -257,6 +257,10 @@ for unknown paths in Xs [total], sample: <path>`. Error reports retain the
257
257
existence-of-drift signal. Live in `reconciler.rs::unknown_path_skips`. To get the
The writer's SQLite busy retry logger (`stall_probe::sqlite_busy` in `writer.rs::spawn`) logs
261
+
per-attempt at DEBUG: brief contention is routine (WAL checkpoints, long-lived readers). It
262
+
escalates to WARN at attempt >= 20 (>100ms of lock wait), which signals a genuine stall.
263
+
260
264
## Gotchas
261
265
262
266
**INSERT OR REPLACE on a populated DB is catastrophically slow**: The `platform_case` collation (NFD + case fold on macOS) runs for every B-tree comparison during unique index lookups. On an empty DB a full scan takes ~2.5 min; on a populated DB with 5.5M entries the same scan takes ~30 min because each `INSERT OR REPLACE` triggers ~20 collation calls to traverse the B-tree. `start_scan()` truncates `entries` and `dir_stats` via `TruncateData` + `flush_blocking()` before every scan to avoid this. Additionally, without truncation, old rows accumulate as orphaned subtrees (3-4x DB bloat per scan cycle) because `INSERT OR REPLACE` only deduplicates at the root level.
0 commit comments