Skip to content

Commit 19f0e98

Browse files
committed
Git browser: cache list_status by index mtime
- New process-global `RwLock<HashMap<RepoRoot, CachedStatus>>` in `status.rs`. One full-repo `git status --porcelain=v2 -z --untracked-files=normal` per `.git/index` mtime change. Subsequent calls slice the cached map by `dir_in_worktree` in memory. - Watcher invalidates the snapshot in `recompute_and_emit` (any `.git/*` mutation drops the cache; the next call repopulates) and on the last unsubscribe so unwatched repos don't pin a snapshot forever. - Always run with `--untracked-files=normal`. Decision noted in `CLAUDE.md`: with the cache, the untracked walk is amortized; the earlier "skip outside root" sketch buys nothing. - Tests: 4 slice tests (root, descendants, lookalikes, self-dir), 4 cache tests (hit, mtime invalidate, explicit invalidate, slice-from-cache). - Bench split into cold + warm paths. On the 50k-file fixture: cold p50=84 ms / p95=99 ms, warm p50=15 µs / p95=18 µs (~5500× faster on cache hits).
1 parent 357d38a commit 19f0e98

4 files changed

Lines changed: 378 additions & 25 deletions

File tree

apps/desktop/src-tauri/src/file_system/git/CLAUDE.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ three new error variants (`ShallowBoundary`, `MissingObject`,
2929
| `submodules.rs` | `list_submodules` – gix `Repository::submodules()`. Each entry sets `redirect_to_path` to `<repo_root>/<rel-path>` |
3030
| `tree.rs` | `list_tree`, `get_tree_entry`, `lookup_blob_id`, `read_blob` – gix tree walks. Permissions reflect `EntryKind::BlobExecutable` so cross-volume copy preserves the executable bit |
3131
| `read_blob.rs` | `GitBlobReadStream` – owns the full `Vec<u8>` and yields 256 KB chunks. See *Honest blob streaming* below |
32-
| `status.rs` | `list_status(repo, dir)` shells out to `git status --porcelain=v2 -z`. Parses the output into a `Vec<EntryStatus>` |
32+
| `status.rs` | `list_status(repo, dir)` runs a full-repo `git status --porcelain=v2 -z` once per `.git/index` mtime, caches the result in a process-global `RwLock<HashMap<RepoRoot, CachedStatus>>`, and slices it by `dir`. The watcher invalidates the snapshot whenever `.git/*` changes. Parses porcelain v2 in `parse_porcelain_v2`. |
3333
| `watcher.rs` | `GitWatcherRegistry` – per-repo notify-rs debouncer. `subscribe(app, root)` returns the current `RepoInfo` synchronously and emits `git-state-changed` on relevant `.git/*` mutations. 200 ms debounce. M2: also calls `notify_directory_changed(.., FullRefresh)` for any cached `.git/{branches,tags}/` listings on the local volume |
3434
| `friendly.rs` | `FriendlyGitError`, `FriendlyGitErrorKind` – ten variants (M1's six, `BlobTooLarge` from M2, plus M4's `ShallowBoundary`, `MissingObject`, `GitDirPermissionDenied`). Active-voice copy, no "error" / "failed". `to_friendly_error()` builds a `volume::FriendlyError` for `ErrorPane`; `encode_for_volume_error()` + `try_decode_git_friendly()` carry the structured payload through `VolumeError::IoError` so the streaming pipeline rebuilds it on the way out |
3535
| `column_meta.rs` | Per-row column-population helpers shared across `virtual_listing`, `log`, `tree`, etc. — `pluralize`, `ahead_behind_for_branch`, `commit_meta`, `files_changed_count`, `recursive_tree_size`, plus newest-of-set helpers for category-level Modified dates |
@@ -146,6 +146,29 @@ The frontend reads `display_size` / `display_size_tooltip` from `FileEntry`; the
146146

147147
## Decisions
148148

149+
**Decision (M4 follow-up)**: Cache `list_status` results keyed by `.git/index` mtime
150+
**Why**: Status used to walk the worktree on every `listing-complete` (every nav,
151+
every diff). On a 50k-file repo that's ~75 ms per nav. We now run one full-repo
152+
walk per index change, store the result in a process-global
153+
`RwLock<HashMap<RepoRoot, CachedStatus>>`, and slice by `dir_in_worktree` on
154+
each call. Cached calls land sub-millisecond on the same fixture (warm p95 in
155+
the bench is bounded by an arbitrary 5 ms ceiling so a busy CI doesn't flake).
156+
The watcher (`watcher.rs::recompute_and_emit`) drops the cache entry on every
157+
`.git/*` mutation it observes, so the next call repopulates. The
158+
`unsubscribe`-on-last-pane path also drops the entry so an unwatched repo
159+
doesn't pin a full-repo-sized snapshot.
160+
161+
**Decision (M4 follow-up)**: Always run with `--untracked-files=normal`, no
162+
"skip untracked outside the worktree root" trick
163+
**Why**: An earlier sketch had us pass `--untracked-files=no` when the caller
164+
scoped to a sub-path inside the worktree, on the theory that listing a deep
165+
subdir doesn't need the full untracked walk. With the cache above, the
166+
untracked walk runs once per index change anyway and the cost is amortized
167+
across every subsequent listing — the extra complexity (two code paths,
168+
mismatched cache keys for the same repo) buys nothing measurable. We always
169+
walk the full worktree with `--untracked-files=normal` and let the cache do
170+
the work.
171+
149172
**Decision (M4)**: Live-toggleable portal via a process-global `AtomicBool`
150173
**Why**: `try_route_listing` / `try_route_metadata` / `try_open_blob_stream`
151174
each early-return `None` when the toggle is off, falling through to the

apps/desktop/src-tauri/src/file_system/git/bench.rs

Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -117,27 +117,44 @@ fn bench_50k_files_discover_and_repo_info_under_budget() {
117117
#[test]
118118
#[ignore = "Builds a 50k-file fixture – opt-in via `cargo test -- --ignored`"]
119119
fn bench_50k_files_list_status_under_budget() {
120+
use super::status::invalidate_status_cache;
120121
let dir = fixture_dir();
121122
ensure_fixture(&dir);
122123

123-
let (handle, _root) = discover_repo(&dir).expect("discover");
124-
// Warm-up: gix's caches and the OS page cache.
125-
let _ = list_status(&handle, &dir);
124+
let (handle, root) = discover_repo(&dir).expect("discover");
126125

127-
let mut samples_us = Vec::with_capacity(RUNS);
126+
// ── Cold path: invalidate before each run so we measure a real walk.
127+
let mut cold_us = Vec::with_capacity(RUNS);
128128
for _ in 0..RUNS {
129+
invalidate_status_cache(&root);
129130
let start = Instant::now();
130131
let _entries = list_status(&handle, &dir).expect("status");
131-
samples_us.push(start.elapsed().as_micros());
132+
cold_us.push(start.elapsed().as_micros());
132133
}
133-
let p95_us = percentile(samples_us.clone(), 95.0);
134-
let p50_us = percentile(samples_us.clone(), 50.0);
134+
let cold_p95 = percentile(cold_us.clone(), 95.0);
135+
let cold_p50 = percentile(cold_us.clone(), 50.0);
135136
eprintln!(
136-
"list_status: p50={}ms p95={}ms (budget 100 ms)",
137-
p50_us / 1000,
138-
p95_us / 1000
137+
"list_status (cold, full walk): p50={}ms p95={}ms (budget 100 ms)",
138+
cold_p50 / 1000,
139+
cold_p95 / 1000
139140
);
140-
assert!(p95_us / 1000 <= 100, "p95 over budget: {}ms", p95_us / 1000);
141+
assert!(cold_p95 / 1000 <= 100, "cold p95 over budget: {}ms", cold_p95 / 1000);
142+
143+
// ── Warm path: walk once, then time cache hits. Should be sub-millisecond.
144+
invalidate_status_cache(&root);
145+
let _ = list_status(&handle, &dir).expect("warmup");
146+
let mut warm_us = Vec::with_capacity(RUNS);
147+
for _ in 0..RUNS {
148+
let start = Instant::now();
149+
let _entries = list_status(&handle, &dir).expect("status");
150+
warm_us.push(start.elapsed().as_micros());
151+
}
152+
let warm_p95 = percentile(warm_us.clone(), 95.0);
153+
let warm_p50 = percentile(warm_us.clone(), 50.0);
154+
eprintln!("list_status (warm, cached): p50={}µs p95={}µs", warm_p50, warm_p95);
155+
// Warm path is in-memory slice + mtime stat. Allow generous 5 ms ceiling
156+
// so a busy CI box doesn't flake; in practice this lands under 1 ms.
157+
assert!(warm_p95 <= 5_000, "warm p95 over 5ms: {}µs", warm_p95);
141158
}
142159

143160
// ── Modified + Size column population bench (M4 follow-up) ──────────

0 commit comments

Comments
 (0)