|
| 1 | +# macOS Spotlight `index.spotlightV3` IVF rotation deadlock — diagnosis, format notes, mitigations, and mechanistic detection |
| 2 | + |
| 3 | +**Date:** 2026-06-05 |
| 4 | +**Author:** validate LLM (for Peter; safe to hand to another LLM/engineer) |
| 5 | +**Machine under study:** Peter's, macOS 26.x ("Tahoe"-era), `mds_stores` `IVFVectorIndex::unlink:4752` loop; iMessage search dead for months. |
| 6 | +**Status:** investigation complete. **No remediation performed** — all options below are for review. |
| 7 | + |
| 8 | +> SCOPE NOTE: Everything about the `.ivf-vector-indexes` binary layout below is **reverse-engineered |
| 9 | +> from byte-pattern inspection of one machine in ~10 minutes**. It is internally consistent and |
| 10 | +> matches the observed failure, but it is **inference, not Apple-documented fact**. The `8tsd` |
| 11 | +> store format, by contrast, is well-reverse-engineered publicly (Yogesh Khatri's `spotlight_parser`). |
| 12 | +> Treat IVF-layer claims as "strong hypothesis"; treat 8tsd claims as "documented." |
| 13 | +
|
| 14 | +--- |
| 15 | + |
| 16 | +## 1. The observed failure |
| 17 | + |
| 18 | +Unified-log signature (repeating dozens of times/sec, indefinitely): |
| 19 | +``` |
| 20 | +mds_stores: (SpotlightIndex) [com.apple.spotlightindex:IVFVectorIndex] |
| 21 | + unlink:4752: IVFVectorIndex::unlink <private> failed 0 <private> |
| 22 | +``` |
| 23 | +Symptoms: terminal/UI responsiveness loss post-reboot (FD/CPU pressure from the busy-loop), and |
| 24 | +iMessage search returning zero results for months. `mds_stores` was at 0% CPU at the exact moment |
| 25 | +of measurement (between bursts) but the on-disk state was **actively mutating during a 10-minute |
| 26 | +window** (live rotation files grew from `live.7`/gen 416 to `live.10`/gen 419), confirming the loop |
| 27 | +is live, not a stale log artifact. |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## 2. On-disk layout |
| 32 | + |
| 33 | +`~/Library/Metadata/CoreSpotlight/<DOMAIN>/index.spotlightV3/` for four protection domains: |
| 34 | + |
| 35 | +| Domain | backs | store.db | n live.N | role | |
| 36 | +|---|---|---|---|---| |
| 37 | +| `Priority` | high-priority items | 9.8 MB | 6 | **healthy** | |
| 38 | +| `NSFileProtectionComplete` | locked-while-locked data | 36 KB | 1 | healthy/idle | |
| 39 | +| `NSFileProtectionCompleteUnlessOpen` | " | 36 KB | 1 | healthy/idle | |
| 40 | +| `NSFileProtectionCompleteUntilFirstUserAuthentication` | **Messages/iMessage** (avail after first unlock) | **559 MB** | **8→11 climbing** | **BROKEN** | |
| 41 | + |
| 42 | +Each domain dir contains: |
| 43 | +- `store.db` (+ `.store.db` shadow): the **`8tsd`** metadata store (NOT SQLite — see §3). |
| 44 | +- `0.ivf-vector-indexes`: the **committed** IVF reference record. |
| 45 | +- `live.N.ivf-vector-indexes`: **rotation snapshots**, N increasing. Each is a tiny (8–40 byte) |
| 46 | + reference record, NOT bulk vector data. |
| 47 | +- plus classic Spotlight sidecars (`0.indexHead`, `0.indexIds`, `0.directoryStoreFile`, …). |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## 3. The `8tsd` store format (DOCUMENTED — Khatri `spotlight_parser`, verified vs Peter's bytes) |
| 52 | + |
| 53 | +All `*.db` here begin with magic `38 74 73 64` = `"8tsd"` (NOT `SQLite format 3` — grep returns 0). |
| 54 | +It is a **paged store** (conceptually SQLite-like: a b-tree-ish block directory), with Apple's own layout. |
| 55 | + |
| 56 | +Header (little-endian; offsets per Khatri, confirmed on Peter's files): |
| 57 | +| Off | Field | Notes | |
| 58 | +|---|---|---| |
| 59 | +| 0x00 | magic `8tsd` | (`7tsd` = older v1 with shifted offsets) | |
| 60 | +| 0x04 | flags | Peter: `0x10801` | |
| 61 | +| 0x24 | header_size | 4096 | |
| 62 | +| 0x28 | block0_size | 16384 (Priority) / 557056 (NSFPCUFUA) | |
| 63 | +| 0x2C | block_size | 16384 | |
| 64 | +| 0x30–0x40 | index_blocktype 0x11/0x21/0x41/0x81×2 | all 0 in these V3 stores | |
| 65 | +| 0x144 | original_path (256B UTF-8) | decodes to the literal file path — **good cheap integrity signal** | |
| 66 | + |
| 67 | +Block 0 at offset `header_size`: magic `1mbd`/`2mbd`, `u32 item_count`, then 16-byte entries |
| 68 | +`<QII>` = (last_id_in_block, offset_index, dest_block_size). Regular blocks: magic `2pbd`, per-block |
| 69 | +compression (LZ4 `bv4*` if type&0x1000, LZFSE `bvx*` if &0x2000, else zlib `0x78`). Records are |
| 70 | +Spotlight-custom-varint encoded, keyed against the 0x11/0x21/0x81 dictionary tables. |
| 71 | + |
| 72 | +**Peter's 559 MB store is structurally coherent:** header valid, `original_path` clean, block-0 |
| 73 | +`1mbd` with `item_count=34104`, and `item_count × block_size = 558,759,936 ≈ file size (0.999)`. |
| 74 | +**So the store.db is NOT torn.** The fault is in the IVF reference layer, not the metadata store. |
| 75 | + |
| 76 | +Canonical ref: https://github.com/ydkhatri/spotlight_parser (trust its offsets over libyal/dtformats). |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## 4. The `.ivf-vector-indexes` reference record (REVERSE-ENGINEERED — hypothesis) |
| 81 | + |
| 82 | +Decoded as a sequence of little-endian `u32`: |
| 83 | +``` |
| 84 | +[ generation, MAGIC, (id, type), (id, type), ..., (0, 0)? ] |
| 85 | + generation : u32, monotonically increasing per write |
| 86 | + MAGIC : 0x015F1DA6 (23010726) constant in every file (format/version stamp) |
| 87 | + (id,type) : live index-entry references; type seen = 655378 (0x000A0012) |
| 88 | + trailing (0,0) : padding/terminator (present in some, absent in others) |
| 89 | +``` |
| 90 | +`0.ivf` = committed state (all domains at generation 1, `live_ids=[]`). `live.N` = rotation log; |
| 91 | +each finalize attempt writes a new `live.N+1` with the current live set, then *should* fold into `0`. |
| 92 | + |
| 93 | +### Evidence table (Peter's machine, 2026-06-05) |
| 94 | + |
| 95 | +| Domain | committed gen | max live gen | GAP | terminal live_ids | verdict | |
| 96 | +|---|---|---|---|---|---| |
| 97 | +| Priority | 1 | 58 | 57 | `[]` (drained at gen 58) | healthy | |
| 98 | +| NSFProtComplete | 1 | 2 | 1 | `[]` | healthy | |
| 99 | +| NSFProtCompleteUnlessOpen | 1 | 2 | 1 | `[]` | healthy | |
| 100 | +| **NSFPCUFUA (Messages)** | 1 | **419 (climbing)** | **418** | **`[532]` never drains** | **DEADLOCK** | |
| 101 | + |
| 102 | +Per-generation live set for NSFPCUFUA (chronological by gen): |
| 103 | +``` |
| 104 | +gen 243: [532, 1054] |
| 105 | +gen 245: [532, 533, 1054] |
| 106 | +gen 405: [] |
| 107 | +gen 406: [532, 1054] |
| 108 | +gen 413: [532, 533, 1054] |
| 109 | +gen 414: [532, 1054] |
| 110 | +gen 415: [532] |
| 111 | +gen 416: [532] |
| 112 | +gen 417: [532] |
| 113 | +gen 418: [532] |
| 114 | +gen 419: [532] <- still climbing during observation |
| 115 | +``` |
| 116 | + |
| 117 | +### Interpretation |
| 118 | +- **A healthy domain drains:** Priority reaches its highest generation (58) with `live_ids=[]` — the |
| 119 | + work completed and committed. |
| 120 | +- **The broken domain cannot drain ID `532`:** every recent generation (415–419) carries exactly |
| 121 | + `[532]` and the generation counter keeps climbing with no progress. `IVFVectorIndex::unlink:4752` |
| 122 | + is failing to remove entry 532; each failed attempt rotates a new `live.N`. Classic **liveness |
| 123 | + bug / spin-deadlock**: monotonic counter advance + zero state progress. |
| 124 | +- ID 532 is the "stuck tombstone": referenced-but-unremovable. (533/1054 churn and eventually clear; |
| 125 | + 532 is wedged.) |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## 5. Mitigations (for Peter to choose; risk/cost spelled out) |
| 130 | + |
| 131 | +| # | Action | Mechanism | Risk | Cost | Reversible? | |
| 132 | +|---|---|---|---|---|---| |
| 133 | +| M1 | **Full reindex**: `sudo mdutil -i off /` → `sudo mdutil -E /` → `sudo mdutil -i on /` | Apple-sanctioned; nukes & rebuilds ALL Spotlight indexes | Low (Apple-supported). Search degraded until reindex done. | High time: hours of CPU to reindex a full disk | Index rebuilds from scratch; no user data lost | |
| 134 | +| M2 | **Domain-surgical**: quit Spotlight indexing, `mv NSFPCUFUA/index.spotlightV3 ~/.Trash/`, let mds rebuild just that domain | Removes only the broken domain; others untouched | Med — relies on this RE diagnosis that NSFPCUFUA is the sole culprit; backs Messages search | Low-Med: only the Messages domain reindexes | Yes — it's in Trash; restorable | |
| 135 | +| M3 | **Micro-surgical**: remove only the stuck `live.N`/`0.ivf` for NSFPCUFUA | Tries to break the rotation without full domain rebuild | **High** — depends entirely on RE'd semantics; could leave store↔IVF inconsistent and make it worse | Lowest if it works | Yes if files trashed not deleted, but state may be incoherent | |
| 136 | +| M4 | **Do nothing / monitor** | — | The loop continues: periodic CPU/FD pressure, no iMessage search | None now, ongoing pain | n/a | |
| 137 | + |
| 138 | +**Recommendation for Peter:** M1 is the safe default (doesn't trust my RE). M2 is the targeted |
| 139 | +option with the best effort/reward *if* the RE diagnosis is accepted (it's well-supported). M3 only |
| 140 | +with a full backup and acceptance of risk. **None should be run without Peter's explicit go.** |
| 141 | + |
| 142 | +Pre-req for any: Time Machine / backup current, since Spotlight state is being mutated. |
| 143 | + |
| 144 | +--- |
| 145 | + |
| 146 | +## 6. What validate can detect mechanistically (the shippable insight) |
| 147 | + |
| 148 | +This failure is a **generic, code-checkable pattern**, not Messages-specific. validate can flag |
| 149 | +"Spotlight index in a stuck/deadlock-looking state" with zero IVF RE risk, purely from observable |
| 150 | +invariants: |
| 151 | + |
| 152 | +### Check A — committed/live generation gap (liveness) |
| 153 | +For each `index.spotlightV3` domain: `gap = max(live.N gen) - committed(0.ivf) gen`. |
| 154 | +A large gap with a **non-empty, non-draining terminal live set** = stuck rotation. Healthy domains |
| 155 | +either have small gaps or drain to `live_ids=[]` at the top generation. |
| 156 | +- WARN threshold candidate: gap > (small constant, e.g. 32) AND terminal live_ids non-empty. |
| 157 | +- FAIL/strong-WARN: gap > 256 AND the same id present in the last K generations (never drains). |
| 158 | + |
| 159 | +### Check B — never-draining reference (the tombstone) |
| 160 | +Compute the set of ids present in the **highest-generation** `live.N`. If any id persists across the |
| 161 | +last K consecutive generations (e.g. K=8) without disappearing, flag it as a stuck/undead reference. |
| 162 | +This is the `unlink` target. Purely set-arithmetic over the decoded reference records — no risk. |
| 163 | + |
| 164 | +### Check C — rotation-count explosion |
| 165 | +`n live.N files` far exceeding peer domains (here 11 vs 1) is a cheap smoke signal of a domain that |
| 166 | +can't finalize. |
| 167 | + |
| 168 | +### Check D — `8tsd` structural integrity (independent, generally useful) |
| 169 | +Header sane (magic, header_size/block_size, original_path is valid UTF-8 path), block-0 `1mbd`/`2mbd` |
| 170 | +map present, `item_count × block_size ≈ file_size`, regular blocks `2pbd` with sizes ≤ block_size and |
| 171 | +`next_block_index` chains that terminate without cycles, and per-block decompress success. Catches |
| 172 | +torn/truncated stores (a *different* corruption class than the deadlock). |
| 173 | + |
| 174 | +### Generalization beyond Spotlight |
| 175 | +The deadlock heuristic — **"a monotonically advancing generation/sequence counter combined with a |
| 176 | +non-draining work set"** — is a reusable signature for *any* rotation/journal/WAL-like structure |
| 177 | +(other Apple caches, app journals, etc.). Worth framing the validate check generically: |
| 178 | +`detectStuckRotation(committed_gen, live_gens[], live_sets[])`. |
| 179 | + |
| 180 | +--- |
| 181 | + |
| 182 | +## 7. Proposed validate deliverables (no machine action; pure code + tests) |
| 183 | + |
| 184 | +1. `src/core/spotlight_store.zig` — `8tsd` header + block-0 + block-walk structural validator (Check D). |
| 185 | +2. `src/core/spotlight_ivf.zig` — decode `.ivf-vector-indexes` reference records; expose |
| 186 | + `(generation, magic_ok, live_ids[])`. |
| 187 | +3. Deadlock heuristics (Checks A/B/C) over a `index.spotlightV3` directory → health verdict. |
| 188 | +4. Wire into `validateSpotlight` (`apple_validators.zig:352-375`, currently magic-only) as a |
| 189 | + deeper-than-structural path; and/or a `validate --spotlight-health <dir>` report subcommand. |
| 190 | +5. **Test fixtures** (synthetic, checked-in): a "healthy" set (small gap, draining live set) and a |
| 191 | + "deadlock" set (huge gap, never-draining id across K generations) — built byte-for-byte to the |
| 192 | + §4 schema. Plus a torn-`8tsd` fixture for Check D. Red-proof each heuristic. |
| 193 | + |
| 194 | +## 8. Open questions / caveats for whoever picks this up |
| 195 | +- IVF `type` field meaning (`655378` = `0x000A0012`) unconfirmed — may encode dimension/quantizer. |
| 196 | +- Whether `0.ivf` committed-gen ever advances past 1 on a healthy machine (all 4 domains here = 1; |
| 197 | + need a second machine to confirm 1 is truly the baseline vs already-stuck-everywhere). |
| 198 | +- The exact store→IVF id linkage (does `8tsd` store reference IVF ids 532/533/1054?) — needs the |
| 199 | + store record decode (Track B depth) to fully close; the deadlock heuristic does NOT require it. |
| 200 | +- `.ivf-vector-indexes` layout is single-machine RE; validate before generalizing. |
| 201 | + |
| 202 | +## 9. Reproduce the diagnosis (read-only, safe) |
| 203 | +```bash |
| 204 | +cd ~/Library/Metadata/CoreSpotlight |
| 205 | +for dom in */index.spotlightV3; do |
| 206 | + for f in "$dom"/*.ivf-vector-indexes; do |
| 207 | + python3 - "$f" <<'PY' |
| 208 | +import sys,struct |
| 209 | +b=open(sys.argv[1],'rb').read(); v=list(struct.unpack('<%dI'%(len(b)//4), b[:len(b)//4*4])) |
| 210 | +gen=v[0]; ids=[v[i] for i in range(2,len(v)-1,2) if v[i]!=0] |
| 211 | +print(f"{sys.argv[1]:70s} gen={gen:4d} live_ids={ids}") |
| 212 | +PY |
| 213 | + done |
| 214 | +done |
| 215 | +``` |
| 216 | +Compare generation spread + terminal live set per domain; the broken one has a huge gap and a |
| 217 | +never-draining id. |
0 commit comments