|
| 1 | +# Selection module |
| 2 | + |
| 3 | +Backend for the Selection dialog (Select files / Deselect files). Mirrors `crate::search` |
| 4 | +but narrower: there is no scope, no system-dir exclusion, no in-memory index, and the |
| 5 | +matcher itself runs in JS against the focused folder's entries. This module owns just |
| 6 | +the persistent history store and the AI translation pipeline. |
| 7 | + |
| 8 | +## Module structure |
| 9 | + |
| 10 | +| File | Purpose | |
| 11 | +|------|---------| |
| 12 | +| `mod.rs` | Re-exports the public surface. | |
| 13 | +| `history.rs` | `SelectionHistoryEntry`, atomic JSON read/write, canonical-key dedupe, cap eviction, schema-version quarantine. Re-exports `HistoryMode` and `HistoryFilters` from `crate::search::history` so the frontend sees the same mode/filter shape for both consumers. | |
| 14 | +| `ai/mod.rs` | Re-exports the AI submodules. | |
| 15 | +| `ai/prompt.rs` | `build_classification_prompt(sample_names)` and `format_sample_block`. Pure functions; no IPC. Returns the system-prompt string the LLM receives. | |
| 16 | +| `ai/parser.rs` | `parse_selection_response(text)` → `ParsedSelectionLlmResponse`. Key-value line parser; mirrors `search::ai::parser` in style but with the narrower field set. | |
| 17 | +| `ai/query_builder.rs` | `build_selection_translate_result(parsed)` → `SelectionTranslateResult`. Assembles the result type that crosses IPC; `generate_caveat` and `build_label` are the supporting helpers. | |
| 18 | + |
| 19 | +The IPC layer is in `crate::commands::selection`. |
| 20 | + |
| 21 | +## History store |
| 22 | + |
| 23 | +Persistent recent-selections store for the dialog's footer + popover. Same atomic-write |
| 24 | +story as `crate::search::history`; the key tradeoffs are: |
| 25 | + |
| 26 | +- **Persistence path**: `{app_data_dir}/selection-history.json`. Schema-versioned via |
| 27 | + `_schemaVersion` (currently `1`). |
| 28 | +- **In-memory cache + disk lock**: in-memory `Mutex<HistoryStore>` plus a separate |
| 29 | + `OnceLock<Mutex<()>>` (`DISK_LOCK`) that serializes the read-modify-write cycle so |
| 30 | + concurrent IPC commands can't lose writes. Cache guards drop before any `fs` call. |
| 31 | +- **Canonical dedupe key**: `mode | normalized_query | filters | case_sensitive`. Four |
| 32 | + segments; Search's key has six (it adds `scope` and `exclude_system_dirs`). Filters |
| 33 | + serialize as alphabetically-keyed `k=v,k=v` pairs with undefined fields omitted. The |
| 34 | + key is never persisted; it only exists at compare time. |
| 35 | +- **Recovery**: parse failure or schema-version mismatch → rename file to `.broken`, |
| 36 | + start fresh. The user keeps using the dialog; the corrupted file is preserved for |
| 37 | + one rotation in case debugging is needed. |
| 38 | +- **Cap**: configurable via `selection.recentSelections.maxCount` (default 1000). |
| 39 | + `apply_max_count` trims the in-memory store on live-apply; `0` clears everything and |
| 40 | + short-circuits future adds. |
| 41 | + |
| 42 | +### Decision: separate `selection-history.json` from `search-history.json` |
| 43 | + |
| 44 | +Storing both consumers' history in one file with a `kind` discriminator was rejected. |
| 45 | +Their schemas already diverge (`scope` and `exclude_system_dirs` are irrelevant for |
| 46 | +Selection), and coupling two unrelated migrations forever didn't earn its keep. The |
| 47 | +small cost of two files is invisible at runtime. |
| 48 | + |
| 49 | +### Decision: re-export `HistoryMode` and `HistoryFilters` from `search::history` |
| 50 | + |
| 51 | +The two pure data types are identical in intent across the two consumers. The |
| 52 | +`SelectionHistoryEntry` struct itself stays separate so the on-disk schema doesn't bind |
| 53 | +Selection to Search's canonical-key shape. If Search's mode set or filter shape ever |
| 54 | +diverges from Selection's, the re-export drops out and the types fork; the wiring is |
| 55 | +already isolated enough that the change is mechanical. |
| 56 | + |
| 57 | +## AI translation |
| 58 | + |
| 59 | +The `translate_selection_query(prompt, sample_names)` IPC orchestrates: |
| 60 | + |
| 61 | +1. Verifies the AI provider is `cloud`. Small local models (4-8K context) can't |
| 62 | + reliably fit a 200+-name folder sample plus the structured prompt and response, so |
| 63 | + the backend hard-errors when provider isn't cloud. The frontend hides the AI chip |
| 64 | + in that case, but this gate is the belt-and-braces check for an MCP caller or a |
| 65 | + misconfigured frontend. |
| 66 | +2. Calls `ai::build_classification_prompt(&sample_names)` to assemble the system |
| 67 | + prompt with today's date and the folder sample. |
| 68 | +3. Runs `chat_completion` via `crate::ai::client` against the configured cloud backend |
| 69 | + with `temperature: 0.2`, `max_tokens: 300`, `top_p: 0.9`. |
| 70 | +4. Parses the response via `ai::parse_selection_response` into a |
| 71 | + `ParsedSelectionLlmResponse`. |
| 72 | +5. Builds the wire-result via `ai::build_selection_translate_result`. |
| 73 | + |
| 74 | +### Decision: cloud-only AI for Selection |
| 75 | + |
| 76 | +Folder samples weigh 1-3k tokens; the prompt plus completion lives ~4-5k tokens. Local |
| 77 | +4-8K context models often can't fit the full payload, and quality on small models is |
| 78 | +unreliable for pattern inference. We surface a tooltip on the gated UI in the frontend |
| 79 | +("AI selection needs a cloud provider. Set one in Settings > AI."); the backend |
| 80 | +returns the same message as a hard error for any non-cloud caller. |
| 81 | + |
| 82 | +### Decision: key-value response format, not JSON |
| 83 | + |
| 84 | +Same rationale as `crate::search::ai`. JSON generation is the #1 failure mode for |
| 85 | +small LLMs. Key-value lines are trivial to produce and parse, missing lines are |
| 86 | +individually skippable, and malformed lines never void the whole response. |
| 87 | + |
| 88 | +### Decision: `pattern` + `kind` instead of structured filter types |
| 89 | + |
| 90 | +The matcher runs on the frontend in JS. There's no benefit to round-tripping a typed |
| 91 | +glob through Rust; the parsed string IS the contract. The kind is `glob` (full-name |
| 92 | +match, `*` and `?` only) or `regex` (JS RegExp). When `pattern` is missing or blank, |
| 93 | +`kind` is forced to `None` so the frontend doesn't compile a half-built query. |
| 94 | + |
| 95 | +### Decision: default `kind` to `glob` when the model omits it |
| 96 | + |
| 97 | +The model occasionally forgets to emit `kind:` for obvious globs (`*.png`, `*.log`). |
| 98 | +Defaulting saves a re-prompt. The parser still drops `kind` to `None` when the value |
| 99 | +isn't one of `glob`/`regex`; the builder catches the missing-kind-with-pattern case |
| 100 | +and substitutes `glob`. |
| 101 | + |
| 102 | +## Real-LLM eval results |
| 103 | + |
| 104 | +The prompt + parser are pinned by `selection/ai/real_llm_eval_test.rs`, six |
| 105 | +`#[ignore]`-gated integration tests against the live OpenAI API. Run them with: |
| 106 | + |
| 107 | +```sh |
| 108 | +OPENAI_API_KEY=$(security find-generic-password -a "$USER" -s "OPENAI_API_KEY" -w) \ |
| 109 | + cargo nextest run --lib --run-ignored only selection::ai::real_llm_eval_test |
| 110 | +``` |
| 111 | + |
| 112 | +The default model is `gpt-4o-mini` (cheap, fast, comparable to the model David has |
| 113 | +configured in his Settings UI for everyday use). When David's cloud-provider model |
| 114 | +changes, edit `MODEL` in the eval file and rerun. |
| 115 | + |
| 116 | +| Intent | Sample shape | Assertions | Status | |
| 117 | +|---|---|---|---| |
| 118 | +| "all log files" | mixed `.log` / `.txt` / `.md` / `.png` | pattern contains `log`, `kind` set | passing | |
| 119 | +| "png and jpg images" | mixed image + text extensions | pattern mentions both png and jpg/jpeg | passing | |
| 120 | +| "files bigger than 5 MB" | mixed sizes | `size_min` ∈ [4 MB, 10 MB], pattern present | passing | |
| 121 | +| "backups from last week" | `*-backup-*` files plus noise | `modified_after` set | passing | |
| 122 | +| "every rymd file" | `rymd-*.pdf` plus noise | pattern matches the keyword | passing | |
| 123 | +| "final drafts I haven't shared" | `Final-*` files | pattern OR caveat present (no half-built query) | passing | |
| 124 | + |
| 125 | +The eval also surfaces drift: a prompt change that breaks one of these assertions |
| 126 | +shows up before the dialog wraps around it. Iterate the prompt, rerun the eval, ship |
| 127 | +the prompt change with green tests. |
| 128 | + |
| 129 | +For ad-hoc debugging (peek at the raw model response), add an `eprintln!` to the |
| 130 | +`translate` helper temporarily (allowed in `#[cfg(test)]` blocks for `--no-capture` |
| 131 | +runs); revert before commit so the crate-level deny on `print_stderr` stays clean. |
| 132 | +Alternatively, run the dialog through the live app and tail |
| 133 | +`RUST_LOG=cmdr_lib::selection::ai=debug pnpm dev`. |
| 134 | + |
| 135 | +## IPC surface |
| 136 | + |
| 137 | +All commands live in `crate::commands::selection`: |
| 138 | + |
| 139 | +| Command | Purpose | |
| 140 | +|---|---| |
| 141 | +| `translate_selection_query(prompt, sample_names)` | AI translation; cloud-only. Returns `SelectionTranslateResult` or an error string. | |
| 142 | +| `get_recent_selections(limit)` | Returns persisted entries (newest first). | |
| 143 | +| `add_recent_selection(entry, max_count)` | Adds + dedupes + caps. | |
| 144 | +| `remove_recent_selection(id)` | Removes by id; no-op when missing. | |
| 145 | +| `clear_recent_selections()` | Drops every entry. | |
| 146 | +| `apply_recent_selections_max_count(max_count)` | Live-applies a freshly-tuned cap. | |
| 147 | + |
| 148 | +All six are registered in `crate::ipc::builder` (runtime dispatch) and |
| 149 | +`crate::ipc_collectors::collect_cross_platform_types` (specta). The bindings appear |
| 150 | +in `apps/desktop/src/lib/ipc/bindings.ts`; the typed wrappers live in |
| 151 | +`apps/desktop/src/lib/tauri-commands/selection.ts`. |
| 152 | + |
| 153 | +## Coupling to other modules |
| 154 | + |
| 155 | +- `crate::search::history`: re-exports `HistoryMode` and `HistoryFilters`. One-way. |
| 156 | +- `crate::ai::manager` + `crate::ai::client`: backend resolution and chat completion. |
| 157 | + Mirrors `crate::commands::search`'s usage exactly. |
| 158 | +- `crate::config::resolved_app_data_dir`: shared persistence-path resolver. |
| 159 | + |
| 160 | +No other modules depend on `selection`; the dialog frontend and command-dispatch wiring |
| 161 | +land in M7. |
0 commit comments