Skip to content

Concurrent writers (hooks + MCP server + CLI) corrupt the palace on chromadb 1.5.8 — sparse-file bloat + SIGSEGV #1092

@AndreyBelyy

Description

@AndreyBelyy

Environment

  • mempalace 3.3.2 (installed via pipx, 2026-04-21 release)
  • chromadb 1.5.8 (latest)
  • Python 3.12.12
  • macOS 26.2 (25C56), ARM64 (Apple Silicon / M4)
  • Claude Code CLI + Claude desktop both connected to the same palace

Summary

When MemPalace's Claude Code hooks (SessionStart, Stop, PreCompact) spawn mempalace mine processes that run concurrently with an active MCP server (python -m mempalace.mcp_server) and/or an explicit CLI mempalace mine invocation, the chromadb 1.5.8 HNSW vector segment becomes corrupted. The corruption manifests as:

  1. Sparse-file bloat: an HNSW segment's link_lists.bin reports a small logical size (~300 KB) but allocates hundreds of gigabytes of disk blocks — in my case 362 GB (337 GiB) allocated for a 302,760-byte file. du -sh ~/.mempalace/palace/ reported 338 GB.
  2. Hard SIGSEGV: any subsequent open of the palace by chromadb crashes Python with EXC_BAD_ACCESS / KERN_INVALID_ADDRESS, because chromadb mmaps the full allocated region and reads run off the end of valid memory.

mempalace repair cannot recover because chromadb itself crashes before the repair logic runs, and even after successful surgical recovery + mempalace repair --yes (34905 drawers rebuilt), chromadb still segfaults on the next read. Only a full nuke + re-mine fixes it — and the corruption reproduces within minutes of restoring hooks.

Detection

stat -f "%N: logical=%z blocks=%b" ~/.mempalace/palace/*/link_lists.bin
du -sh ~/.mempalace/palace/

Root cause (hypothesis)

chromaDB 1.5.8's local HNSW segment writer is not safe under concurrent writers across processes. MemPalace currently allows three concurrent writer paths: (1) hooks spawning mempalace mine, (2) MCP server writes, (3) user-invoked CLI mempalace mine. No cross-process lock prevents them from touching the HNSW segment simultaneously.

Suggestions

  • Serialize all writers via a cross-process lock (flock on the palace dir).
  • Or route all writes through the MCP server (CLI + hooks become MCP clients).
  • Add a startup sanity check warning when HNSW *.bin has blocks*512 far > logical size.
  • Fsync the HNSW index after each batch so mid-write crashes don't leave sparse garbage.

Happy to share the full Python crash report on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstorage

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions