fix(repair): rebuild_index bails on col.count() — exactly the call HNSW corruption breaks

## Summary

`mempalace repair --mode legacy` cannot repair palaces whose corruption manifests at `Collection.count()` — which is the most common HNSW corruption symptom users hit (`status`, search, and repair all fail with the same error). The repair tool calls `col.count()` as its first step, so it bails on the exact case it most needs to handle.

## Reproducer

Version: `mempalace 3.3.4` (pipx, Python 3.12.7), macOS 14 (Darwin 25.3.0), ChromaDB pinned to current.

State: A palace built on 0.6.x and used continuously through several upgrades (52,300 embeddings, two collections: `mempalace_drawers` + `mempalace_closets`). All drawer data is present and readable in `chroma.sqlite3`; only the HNSW index/WAL is corrupt.

```
$ mempalace status
chromadb.errors.InternalError: Error executing plan: Error sending backfill
request to compactor: Failed to apply logs to the hnsw segment writer

$ mempalace repair
=======================================================
  MemPalace Repair
=======================================================
  Palace: /Users/.../.mempalace/palace
  Error reading palace: Error executing plan: Error sending backfill request
  to compactor: Failed to apply logs to the hnsw segment writer
  Cannot recover — palace may need to be re-mined from source files.

$ mempalace repair --mode max-seq-id --dry-run
  No poisoned max_seq_id rows detected. Nothing to do.
```

## Root cause

`mempalace/repair.py::rebuild_index` (3.3.4):

```python
backend = ChromaBackend()
try:
    col = backend.get_collection(palace_path, COLLECTION_NAME)
    total = col.count()                          # ← fails here on HNSW corruption
except Exception as e:
    print(f"  Error reading palace: {e}")
    print("  Palace may need to be re-mined from source files.")
    return
```

`col.count()` triggers ChromaDB's compactor, which tries to apply queued WAL log entries to the HNSW segment writer — and that's where the corruption lives. So:

- `status` (`miner.py::status` → `col.count()`) — fails
- `search` — fails (PR #1081 adds a hint, but doesn't recover)
- `repair --mode legacy` — fails before extraction
- `repair --mode max-seq-id` — succeeds but doesn't apply (different corruption class)

The user is left with `"may need to be re-mined from source files"` despite their data being fully intact in `chroma.sqlite3`. SQLite query confirms it:

```
sqlite> SELECT segment_id, COUNT(*) FROM embeddings GROUP BY segment_id;
4ed454e5-...|4101
a95d2236-...|48199
```

## Suggested fix

`rebuild_index` should bypass `col.count()` and source IDs/documents/metadata directly from SQLite when the chroma client raises an HNSW error. Sketch:

1. Try `col.count()` / `col.get()` first (current path — fast and works on healthy palaces).
2. On `chromadb.errors.InternalError` referencing the HNSW segment, fall through to a SQLite reader that pulls `embeddings.embedding_id`, `embedding_metadata` rows, and the document blob directly per collection.
3. Then proceed with the existing `delete_collection` + recreate + upsert flow.

This would also retire the `"may need to be re-mined from source files"` advice for any palace where the SQLite layer is intact, which is the common case.

## Cross-refs

- #1035 (the original 0.6.x → filtered-query failure that prompted PR #1081)
- #1081 (search-path repair hint — open)
- #1108 (closed via #1173: `quarantine_stale_hnsw` on client open)
- #1191, #1227, #1287 (HNSW capacity/divergence fixes in 3.3.4)
- #1288 (max-seq-id BLOB decode in repair)

3.3.4's quarantine logic improves the *client open* path but doesn't intercept errors raised mid-operation by an already-loaded collection, which is what `count()`/`query()` hit. So this gap survives 3.3.4.

## Workaround

For users in this state today: extract drawers directly from `chroma.sqlite3`, then re-upsert into a fresh collection. This is essentially what a fixed `rebuild_index` would do automatically.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(repair): rebuild_index bails on col.count() — exactly the call HNSW corruption breaks #1308

Summary

Reproducer

Root cause

Suggested fix

Cross-refs

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix(repair): rebuild_index bails on col.count() — exactly the call HNSW corruption breaks #1308

Description

Summary

Reproducer

Root cause

Suggested fix

Cross-refs

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions