Skip to content

Commit 51c7edf

Browse files
committed
spotlight: detect IVF rotation deadlock + decode .ivf-vector-indexes records
Adds src/core/spotlight_ivf.zig: pure-Zig decoder for macOS Spotlight .ivf-vector-indexes reference records (generation, magic, live-id set) plus a generic stuck-rotation heuristic. Detects the mds_stores IVFVectorIndex::unlink busy-loop signature — a monotonically climbing generation counter with a non-draining live id across the most-recent K rotations (the index entry that can never be unlinked, so each finalize rotates a new live.N forever). Reverse-engineered 2026-06-05 from a real stuck machine (Messages/iMessage search dead for months); full diagnosis + 8tsd format notes + mitigations in docs/spotlight-ivf-deadlock-diagnosis-2026-06-05.md, design/impl plan in docs/superpowers/plans/2026-06-05-spotlight-8tsd-deep-validation.md. Tests (red-proofed): decode (empty/live/bad-magic/too-short), healthy-drains, deadlock (id wedged across generations -> .deadlock), suspicious (big gap but ids progress -> .suspicious), empty-rotation edge. Neutralizing the stuck-id detection degrades the deadlock case to .suspicious and fails the test. Registered in mod.zig (pub export + test block). nix checks.test green. Pure logic only (no I/O), per the hexagonal arch rule; wiring into validateSpotlight + 8tsd structural checks + synthetic-fixture I/O tests are follow-ups per the plan.
1 parent c7d7173 commit 51c7edf

4 files changed

Lines changed: 665 additions & 0 deletions

File tree

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# macOS Spotlight `index.spotlightV3` IVF rotation deadlock — diagnosis, format notes, mitigations, and mechanistic detection
2+
3+
**Date:** 2026-06-05
4+
**Author:** validate LLM (for Peter; safe to hand to another LLM/engineer)
5+
**Machine under study:** Peter's, macOS 26.x ("Tahoe"-era), `mds_stores` `IVFVectorIndex::unlink:4752` loop; iMessage search dead for months.
6+
**Status:** investigation complete. **No remediation performed** — all options below are for review.
7+
8+
> SCOPE NOTE: Everything about the `.ivf-vector-indexes` binary layout below is **reverse-engineered
9+
> from byte-pattern inspection of one machine in ~10 minutes**. It is internally consistent and
10+
> matches the observed failure, but it is **inference, not Apple-documented fact**. The `8tsd`
11+
> store format, by contrast, is well-reverse-engineered publicly (Yogesh Khatri's `spotlight_parser`).
12+
> Treat IVF-layer claims as "strong hypothesis"; treat 8tsd claims as "documented."
13+
14+
---
15+
16+
## 1. The observed failure
17+
18+
Unified-log signature (repeating dozens of times/sec, indefinitely):
19+
```
20+
mds_stores: (SpotlightIndex) [com.apple.spotlightindex:IVFVectorIndex]
21+
unlink:4752: IVFVectorIndex::unlink <private> failed 0 <private>
22+
```
23+
Symptoms: terminal/UI responsiveness loss post-reboot (FD/CPU pressure from the busy-loop), and
24+
iMessage search returning zero results for months. `mds_stores` was at 0% CPU at the exact moment
25+
of measurement (between bursts) but the on-disk state was **actively mutating during a 10-minute
26+
window** (live rotation files grew from `live.7`/gen 416 to `live.10`/gen 419), confirming the loop
27+
is live, not a stale log artifact.
28+
29+
---
30+
31+
## 2. On-disk layout
32+
33+
`~/Library/Metadata/CoreSpotlight/<DOMAIN>/index.spotlightV3/` for four protection domains:
34+
35+
| Domain | backs | store.db | n live.N | role |
36+
|---|---|---|---|---|
37+
| `Priority` | high-priority items | 9.8 MB | 6 | **healthy** |
38+
| `NSFileProtectionComplete` | locked-while-locked data | 36 KB | 1 | healthy/idle |
39+
| `NSFileProtectionCompleteUnlessOpen` | " | 36 KB | 1 | healthy/idle |
40+
| `NSFileProtectionCompleteUntilFirstUserAuthentication` | **Messages/iMessage** (avail after first unlock) | **559 MB** | **8→11 climbing** | **BROKEN** |
41+
42+
Each domain dir contains:
43+
- `store.db` (+ `.store.db` shadow): the **`8tsd`** metadata store (NOT SQLite — see §3).
44+
- `0.ivf-vector-indexes`: the **committed** IVF reference record.
45+
- `live.N.ivf-vector-indexes`: **rotation snapshots**, N increasing. Each is a tiny (8–40 byte)
46+
reference record, NOT bulk vector data.
47+
- plus classic Spotlight sidecars (`0.indexHead`, `0.indexIds`, `0.directoryStoreFile`, …).
48+
49+
---
50+
51+
## 3. The `8tsd` store format (DOCUMENTED — Khatri `spotlight_parser`, verified vs Peter's bytes)
52+
53+
All `*.db` here begin with magic `38 74 73 64` = `"8tsd"` (NOT `SQLite format 3` — grep returns 0).
54+
It is a **paged store** (conceptually SQLite-like: a b-tree-ish block directory), with Apple's own layout.
55+
56+
Header (little-endian; offsets per Khatri, confirmed on Peter's files):
57+
| Off | Field | Notes |
58+
|---|---|---|
59+
| 0x00 | magic `8tsd` | (`7tsd` = older v1 with shifted offsets) |
60+
| 0x04 | flags | Peter: `0x10801` |
61+
| 0x24 | header_size | 4096 |
62+
| 0x28 | block0_size | 16384 (Priority) / 557056 (NSFPCUFUA) |
63+
| 0x2C | block_size | 16384 |
64+
| 0x30–0x40 | index_blocktype 0x11/0x21/0x41/0x81×2 | all 0 in these V3 stores |
65+
| 0x144 | original_path (256B UTF-8) | decodes to the literal file path — **good cheap integrity signal** |
66+
67+
Block 0 at offset `header_size`: magic `1mbd`/`2mbd`, `u32 item_count`, then 16-byte entries
68+
`<QII>` = (last_id_in_block, offset_index, dest_block_size). Regular blocks: magic `2pbd`, per-block
69+
compression (LZ4 `bv4*` if type&0x1000, LZFSE `bvx*` if &0x2000, else zlib `0x78`). Records are
70+
Spotlight-custom-varint encoded, keyed against the 0x11/0x21/0x81 dictionary tables.
71+
72+
**Peter's 559 MB store is structurally coherent:** header valid, `original_path` clean, block-0
73+
`1mbd` with `item_count=34104`, and `item_count × block_size = 558,759,936 ≈ file size (0.999)`.
74+
**So the store.db is NOT torn.** The fault is in the IVF reference layer, not the metadata store.
75+
76+
Canonical ref: https://github.com/ydkhatri/spotlight_parser (trust its offsets over libyal/dtformats).
77+
78+
---
79+
80+
## 4. The `.ivf-vector-indexes` reference record (REVERSE-ENGINEERED — hypothesis)
81+
82+
Decoded as a sequence of little-endian `u32`:
83+
```
84+
[ generation, MAGIC, (id, type), (id, type), ..., (0, 0)? ]
85+
generation : u32, monotonically increasing per write
86+
MAGIC : 0x015F1DA6 (23010726) constant in every file (format/version stamp)
87+
(id,type) : live index-entry references; type seen = 655378 (0x000A0012)
88+
trailing (0,0) : padding/terminator (present in some, absent in others)
89+
```
90+
`0.ivf` = committed state (all domains at generation 1, `live_ids=[]`). `live.N` = rotation log;
91+
each finalize attempt writes a new `live.N+1` with the current live set, then *should* fold into `0`.
92+
93+
### Evidence table (Peter's machine, 2026-06-05)
94+
95+
| Domain | committed gen | max live gen | GAP | terminal live_ids | verdict |
96+
|---|---|---|---|---|---|
97+
| Priority | 1 | 58 | 57 | `[]` (drained at gen 58) | healthy |
98+
| NSFProtComplete | 1 | 2 | 1 | `[]` | healthy |
99+
| NSFProtCompleteUnlessOpen | 1 | 2 | 1 | `[]` | healthy |
100+
| **NSFPCUFUA (Messages)** | 1 | **419 (climbing)** | **418** | **`[532]` never drains** | **DEADLOCK** |
101+
102+
Per-generation live set for NSFPCUFUA (chronological by gen):
103+
```
104+
gen 243: [532, 1054]
105+
gen 245: [532, 533, 1054]
106+
gen 405: []
107+
gen 406: [532, 1054]
108+
gen 413: [532, 533, 1054]
109+
gen 414: [532, 1054]
110+
gen 415: [532]
111+
gen 416: [532]
112+
gen 417: [532]
113+
gen 418: [532]
114+
gen 419: [532] <- still climbing during observation
115+
```
116+
117+
### Interpretation
118+
- **A healthy domain drains:** Priority reaches its highest generation (58) with `live_ids=[]` — the
119+
work completed and committed.
120+
- **The broken domain cannot drain ID `532`:** every recent generation (415–419) carries exactly
121+
`[532]` and the generation counter keeps climbing with no progress. `IVFVectorIndex::unlink:4752`
122+
is failing to remove entry 532; each failed attempt rotates a new `live.N`. Classic **liveness
123+
bug / spin-deadlock**: monotonic counter advance + zero state progress.
124+
- ID 532 is the "stuck tombstone": referenced-but-unremovable. (533/1054 churn and eventually clear;
125+
532 is wedged.)
126+
127+
---
128+
129+
## 5. Mitigations (for Peter to choose; risk/cost spelled out)
130+
131+
| # | Action | Mechanism | Risk | Cost | Reversible? |
132+
|---|---|---|---|---|---|
133+
| M1 | **Full reindex**: `sudo mdutil -i off /``sudo mdutil -E /``sudo mdutil -i on /` | Apple-sanctioned; nukes & rebuilds ALL Spotlight indexes | Low (Apple-supported). Search degraded until reindex done. | High time: hours of CPU to reindex a full disk | Index rebuilds from scratch; no user data lost |
134+
| M2 | **Domain-surgical**: quit Spotlight indexing, `mv NSFPCUFUA/index.spotlightV3 ~/.Trash/`, let mds rebuild just that domain | Removes only the broken domain; others untouched | Med — relies on this RE diagnosis that NSFPCUFUA is the sole culprit; backs Messages search | Low-Med: only the Messages domain reindexes | Yes — it's in Trash; restorable |
135+
| M3 | **Micro-surgical**: remove only the stuck `live.N`/`0.ivf` for NSFPCUFUA | Tries to break the rotation without full domain rebuild | **High** — depends entirely on RE'd semantics; could leave store↔IVF inconsistent and make it worse | Lowest if it works | Yes if files trashed not deleted, but state may be incoherent |
136+
| M4 | **Do nothing / monitor** || The loop continues: periodic CPU/FD pressure, no iMessage search | None now, ongoing pain | n/a |
137+
138+
**Recommendation for Peter:** M1 is the safe default (doesn't trust my RE). M2 is the targeted
139+
option with the best effort/reward *if* the RE diagnosis is accepted (it's well-supported). M3 only
140+
with a full backup and acceptance of risk. **None should be run without Peter's explicit go.**
141+
142+
Pre-req for any: Time Machine / backup current, since Spotlight state is being mutated.
143+
144+
---
145+
146+
## 6. What validate can detect mechanistically (the shippable insight)
147+
148+
This failure is a **generic, code-checkable pattern**, not Messages-specific. validate can flag
149+
"Spotlight index in a stuck/deadlock-looking state" with zero IVF RE risk, purely from observable
150+
invariants:
151+
152+
### Check A — committed/live generation gap (liveness)
153+
For each `index.spotlightV3` domain: `gap = max(live.N gen) - committed(0.ivf) gen`.
154+
A large gap with a **non-empty, non-draining terminal live set** = stuck rotation. Healthy domains
155+
either have small gaps or drain to `live_ids=[]` at the top generation.
156+
- WARN threshold candidate: gap > (small constant, e.g. 32) AND terminal live_ids non-empty.
157+
- FAIL/strong-WARN: gap > 256 AND the same id present in the last K generations (never drains).
158+
159+
### Check B — never-draining reference (the tombstone)
160+
Compute the set of ids present in the **highest-generation** `live.N`. If any id persists across the
161+
last K consecutive generations (e.g. K=8) without disappearing, flag it as a stuck/undead reference.
162+
This is the `unlink` target. Purely set-arithmetic over the decoded reference records — no risk.
163+
164+
### Check C — rotation-count explosion
165+
`n live.N files` far exceeding peer domains (here 11 vs 1) is a cheap smoke signal of a domain that
166+
can't finalize.
167+
168+
### Check D — `8tsd` structural integrity (independent, generally useful)
169+
Header sane (magic, header_size/block_size, original_path is valid UTF-8 path), block-0 `1mbd`/`2mbd`
170+
map present, `item_count × block_size ≈ file_size`, regular blocks `2pbd` with sizes ≤ block_size and
171+
`next_block_index` chains that terminate without cycles, and per-block decompress success. Catches
172+
torn/truncated stores (a *different* corruption class than the deadlock).
173+
174+
### Generalization beyond Spotlight
175+
The deadlock heuristic — **"a monotonically advancing generation/sequence counter combined with a
176+
non-draining work set"** — is a reusable signature for *any* rotation/journal/WAL-like structure
177+
(other Apple caches, app journals, etc.). Worth framing the validate check generically:
178+
`detectStuckRotation(committed_gen, live_gens[], live_sets[])`.
179+
180+
---
181+
182+
## 7. Proposed validate deliverables (no machine action; pure code + tests)
183+
184+
1. `src/core/spotlight_store.zig``8tsd` header + block-0 + block-walk structural validator (Check D).
185+
2. `src/core/spotlight_ivf.zig` — decode `.ivf-vector-indexes` reference records; expose
186+
`(generation, magic_ok, live_ids[])`.
187+
3. Deadlock heuristics (Checks A/B/C) over a `index.spotlightV3` directory → health verdict.
188+
4. Wire into `validateSpotlight` (`apple_validators.zig:352-375`, currently magic-only) as a
189+
deeper-than-structural path; and/or a `validate --spotlight-health <dir>` report subcommand.
190+
5. **Test fixtures** (synthetic, checked-in): a "healthy" set (small gap, draining live set) and a
191+
"deadlock" set (huge gap, never-draining id across K generations) — built byte-for-byte to the
192+
§4 schema. Plus a torn-`8tsd` fixture for Check D. Red-proof each heuristic.
193+
194+
## 8. Open questions / caveats for whoever picks this up
195+
- IVF `type` field meaning (`655378` = `0x000A0012`) unconfirmed — may encode dimension/quantizer.
196+
- Whether `0.ivf` committed-gen ever advances past 1 on a healthy machine (all 4 domains here = 1;
197+
need a second machine to confirm 1 is truly the baseline vs already-stuck-everywhere).
198+
- The exact store→IVF id linkage (does `8tsd` store reference IVF ids 532/533/1054?) — needs the
199+
store record decode (Track B depth) to fully close; the deadlock heuristic does NOT require it.
200+
- `.ivf-vector-indexes` layout is single-machine RE; validate before generalizing.
201+
202+
## 9. Reproduce the diagnosis (read-only, safe)
203+
```bash
204+
cd ~/Library/Metadata/CoreSpotlight
205+
for dom in */index.spotlightV3; do
206+
for f in "$dom"/*.ivf-vector-indexes; do
207+
python3 - "$f" <<'PY'
208+
import sys,struct
209+
b=open(sys.argv[1],'rb').read(); v=list(struct.unpack('<%dI'%(len(b)//4), b[:len(b)//4*4]))
210+
gen=v[0]; ids=[v[i] for i in range(2,len(v)-1,2) if v[i]!=0]
211+
print(f"{sys.argv[1]:70s} gen={gen:4d} live_ids={ids}")
212+
PY
213+
done
214+
done
215+
```
216+
Compare generation spread + terminal live set per domain; the broken one has a huge gap and a
217+
never-draining id.
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Spotlight `8tsd` deep validation — design proposal + implementation plan
2+
3+
**Date:** 2026-06-05
4+
**Author:** validate LLM
5+
**Re:** project-manager briefing `inbox/2026-06-05-spotlight-index-deep-validation-opportunity.md`
6+
**Status:** proposal (investigation complete; awaiting greenlight before validator code)
7+
8+
## Executive summary
9+
10+
The briefing's core technical premise was **wrong in one load-bearing way**, which I caught by
11+
inspecting the real bytes on Peter's machine before designing:
12+
13+
> **None of the Spotlight `*.db` files are SQLite.** Every `store.db`, `skg_store.db`, and
14+
> `embedding_store.db` under `~/Library/Metadata/CoreSpotlight/` begins with magic `38 74 73 64`
15+
> = **`8tsd`**, Apple's proprietary Spotlight store format — *not* `SQLite format 3`. `grep` for
16+
> `SQLite format 3` across `store.db` returns **0** hits.
17+
18+
So the briefing's **Track A as written** ("open store.db with sqlite3, walk tables, cross-reference")
19+
**cannot work** — there is no SQLite layer. This proposal revises Track A onto the correct
20+
foundation: parse the `8tsd` format directly. The good news: `8tsd` is **well reverse-engineered
21+
publicly** (Yogesh Khatri's `spotlight_parser`), and Peter's actual header bytes match that spec
22+
exactly — so we are *not* starting from zero.
23+
24+
Peter's hypothesis ("forked SQLite + Apple magic") is **right in spirit, wrong in letter**: it IS a
25+
paged store with a b-tree-like block directory (conceptually SQLite-ish), but it is Apple's own
26+
on-disk layout, not SQLite internally.
27+
28+
## Verified facts (against Peter's real files)
29+
30+
`8tsd` header (Khatri's offsets, confirmed byte-for-byte on `Priority/index.spotlightV3/store.db`):
31+
32+
| Offset | Field | Peter's value |
33+
|---|---|---|
34+
| 0x00 | magic | `8tsd` |
35+
| 0x04 | flags | `0x10801` |
36+
| 0x24 | header_size | 4096 |
37+
| 0x28 | block0_size | 16384 |
38+
| 0x2C | block_size | 16384 |
39+
| 0x30–0x40 | index_blocktype 0x11/0x21/0x41/0x81×2 | all 0 (V3 variant — note, not necessarily corruption) |
40+
| 0x144 | original_path (256B) | `/System/Volumes/Data/Users/pmarreck/Library/Metadata/CoreSpotlight/Priority/index.spotlightV3/store.db` |
41+
42+
- **Block 0** (at offset = header_size = 4096) has magic `1mbd` ("dbm1" map block), `block0_size=0x4000`,
43+
`item_count=602` — the record-block directory. Matches Khatri exactly.
44+
- Regular blocks: magic `2pbd` ("dbp2"), per-block compression (LZ4 `bv4*` @ type&0x1000, LZFSE
45+
`bvx*` @ &0x2000, else zlib `0x78`), varint-encoded records keyed against 0x11/0x21/0x81 tables.
46+
- **`.ivf-vector-indexes`**: tiny (40 bytes on Peter's machine for `live.2`), a sequence of
47+
little-endian reference records, NOT bulk vectors. Format is **genuinely undocumented publicly**
48+
(confirmed dead-end via DFIR/forensics search) — true from-scratch RE for Track B.
49+
50+
Canonical reference: Yogesh Khatri `spotlight_parser.py`
51+
(https://github.com/ydkhatri/spotlight_parser) — trust its struct offsets over libyal/dtformats
52+
where they conflict (verified against Peter's bytes).
53+
54+
## Revised tracks
55+
56+
### Track A′ — `8tsd` structural-integrity validator (no IVF RE needed) [SHIP v1]
57+
58+
Replace the magic-only `validateSpotlight` stub (`apple_validators.zig:352-375`) with a real
59+
structural walk that catches corruption from the format side:
60+
61+
1. Parse + sanity-check the header (magic, version 8tsd vs 7tsd, header_size/block_size sane,
62+
`original_path` is valid UTF-8 and plausibly a path).
63+
2. Parse block 0 (`1mbd`/`2mbd` map): verify magic, `item_count`, each directory entry's
64+
`offset_index * block_size` lands within the file.
65+
3. Walk regular blocks: verify each `2pbd` signature, `physical_size`/`logical_size` ≤ block_size,
66+
`next_block_index` chains terminate (no cycles, no out-of-range), and **each block decompresses
67+
successfully** (LZ4/LZFSE/zlib by type bits) to its declared `logical_size`.
68+
4. Emit honest depth: full structural validation = `.full` when all blocks parse+decompress;
69+
`okWithDepthAndWarning` for unsupported sub-variants (no silent skip).
70+
71+
This is the corruption surface for the stuck-state class. ~300–500 lines. Pure-Zig (we already
72+
vendor zlib; LZ4/LZFSE blocks may need a decoder — assess: zlib-only stores may suffice for v1,
73+
WARN on LZFSE/LZ4 blocks until a decoder is wired).
74+
75+
### Track A″ — orphan-tombstone cross-reference [SHIP v1, the actual Peter bug]
76+
77+
The `mds_stores` `IVFVectorIndex::unlink` loop is the index trying to delete a record whose backing
78+
file is gone (or vice-versa). Detect it WITHOUT full IVF parsing:
79+
80+
1. From the `8tsd` store, enumerate the record IDs / references the index believes it owns.
81+
2. Enumerate the `.ivf-vector-indexes` and `live.N.*` rotation files actually present on disk.
82+
3. **Cross-check:** flag store records referencing a missing index file (orphan → the unlink
83+
target that never dies), and index files with no owning store record (stale → garbage).
84+
4. Report the specific dangling reference. That's the smoking gun.
85+
86+
Caveat: the store→IVF reference mechanism is the least-documented link. v1 may start with the
87+
coarser "does every `live.N`/`0.ivf-vector-indexes` referenced by rotation metadata exist" check
88+
and tighten as RE deepens.
89+
90+
### Track B — `.ivf-vector-indexes` format RE [BIG SHIP, Peter-sanctioned]
91+
92+
From-scratch RE (no public docs):
93+
- Disassemble `IVFVectorIndex.framework/.../IVFVectorIndex`; locate `unlink` (the error is
94+
`unlink:4752`), `load`/`save` paths; recover the on-disk struct via `nm`/`dwarfdump` + the
95+
binary's Obj-C++ class metadata.
96+
- Diff known-good vs known-broken `.ivf-vector-indexes` (Peter's machine provides both — broken is
97+
the one mds_stores loops on).
98+
- Produce `src/core/spotlight_ivf.zig` + a public format spec (defensible portfolio artifact;
99+
zero public prior art).
100+
101+
### Bonus — diagnose Peter's machine NOW [unblocks iMessage search, months-broken]
102+
103+
Read-only tool: parse Peter's CoreSpotlight `8tsd` stores + IVF reference records, find the
104+
orphan/dangling tombstone feeding the `unlink:4752` loop, print exactly which file to remove (or
105+
which `index.spotlightV3` dir to `mdutil -E`) to break the loop. Even if throwaway, it ends a
106+
months-long blocker.
107+
108+
## Architecture (per Peter's rules)
109+
110+
- Logic in pure-Zig core: new `src/core/spotlight_store.zig` (8tsd parser) + extend
111+
`apple_validators.zig` `validateSpotlight` to call it. IVF → `src/core/spotlight_ivf.zig` (Track B).
112+
- Route through C FFI; no Zig CLI bypass.
113+
- TDD: Peter's real `store.db` (copied read-only to fixtures, or a trimmed synthetic) as the
114+
known-good; a byte-corrupted copy as known-bad. Oracle cross-check against Khatri's
115+
`spotlight_parser.py` output where possible.
116+
- Never touch the live files mds_stores is fighting — always operate on RAM/tmp copies.
117+
118+
## Suggested order (Peter said "all of these, in this order")
119+
120+
1. **Bonus diagnosis** (today; unblocks Peter, validates the 8tsd+IVF reading end-to-end).
121+
2. **Track A′** (8tsd structural validator → real `validateSpotlight`).
122+
3. **Track A″** (orphan-tombstone cross-reference).
123+
4. **Track B** (IVF RE → spotlight_ivf.zig + spec).
124+
125+
## Risks / unknowns
126+
127+
- store→IVF reference link is the least-documented; Track A″ precision improves with Track B.
128+
- LZFSE/LZ4 block decompression may need a Zig decoder; v1 can WARN on those blocks (no silent skip).
129+
- IVF RE is open-ended; timebox and checkpoint.

0 commit comments

Comments
 (0)