You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mempalace 3.1.0 (pipx venv, installed directly from the Claude Code plugin marketplace source, not PyPI)
chromadb 0.6.3
Python 3.13 (Homebrew)
macOS 15.x ARM64 (Apple M1, 8-core)
Symptom
A background python3 -m mempalace mine $MEMPAL_DIR subprocess spawned by mempalace/hooks_cli.py::_maybe_auto_ingest crashes with EXC_BAD_ACCESS (SIGSEGV) inside hnswlib.cpython-313-darwin.so. The Stop hook itself exits cleanly — the mining subprocess dies detached, so Claude Code / VSCode sees a successful hook. The crash surfaces as a macOS crash report, not as a mempalace error message.
The unambiguous marker is repairConnectionsForUpdate in the crashed thread's stack — that function is only called on the HNSW update path (addItems on an already-existing label).
Root cause
Three independently-reasonable design choices combine into a race:
Upsert on an existing ID pushes ChromaDB through hnswlib.addItems → addPoint existing_internal_id branch → updatePoint → repairConnectionsForUpdate, which has a long-standing thread-safety bug in nmslib/hnswlib: searchBaseLayer dereferences a neighbor pointer that another worker thread is concurrently mutating.
This is a file-level check, not per-chunk. A single mtime change on one file ⇒ every chunk in that file is re-upserted with its unchanged deterministic drawer_id, producing a batch of update-path operations that ChromaDB runs concurrently through hnswlib's std::thread-based ParallelFor (≈ hardware_concurrency() workers, ~8 on M1).
Masking history
The race existed in earlier 3.x code but was masked because the mtime check was broken and file_already_mined always returned True. Commit c2308a1 "fix: address code review — restore mtime check, bound metadata reads, harden security" in the 2026-04-09 critical-bugfixes merge (PR #399, shipped as 3.1.0) re-enabled re-mining and unmasked the crash.
Reproducer
Install mempalace 3.1.0 on macOS ARM64 with Python 3.13 and chromadb 0.6.3
Mine any project: mempalace mine <dir>
Modify a file inside the mined tree (e.g., touch <file> or echo "x" >> <file>)
Re-run mempalace mine <dir> — either manually or by letting the Claude Code Stop hook fire (if MEMPAL_DIR is exported)
Check ~/Library/Logs/DiagnosticReports/ for a crashed Python process with the fingerprint above
On-disk safety
The crash does not corrupt the palace index. ChromaDB has not flushed hnswlib state at the time of the segfault; ~/.mempalace/palace/<uuid>/data_level0.bin and siblings stay at the last successful mine's mtime. No restore needed. Confirmed on this machine: palace dir timestamps stayed at the pre-crash mine time after the crash.
Non-solutions
OMP_NUM_THREADS=1 does not help. hnswlib uses its own std::thread pool via ParallelFor, not OpenMP. The crash stack shows std::__1::thread::join with zero OpenMP frames.
Wiping ~/.mempalace/palace and re-mining from scratch only buys time — the race fires on the next modified file.
Downgrading to 3.0.x doesn't help; 3.0.x has the same code paths and the masking mtime-check bug wasn't a safety net.
Proposed fix
One-hunk patch in mempalace/miner.py::process_file, right before the for chunk in chunks loop:
# Purge stale drawers for this file before re-inserting the fresh chunks.# Converts modified-file re-mines from upsert-over-existing-IDs (which hits# hnswlib's thread-unsafe updatePoint path and can segfault on macOS ARM with# chromadb 0.6.3) into a clean delete+insert, bypassing the update path# entirely.try:
collection.delete(where={"source_file": source_file})
exceptException:
pass
This converts every re-mine into a pure INSERT path that bypasses updatePoint / repairConnectionsForUpdate entirely. PR to follow.
Environment
Symptom
A background
python3 -m mempalace mine $MEMPAL_DIRsubprocess spawned bymempalace/hooks_cli.py::_maybe_auto_ingestcrashes withEXC_BAD_ACCESS (SIGSEGV)insidehnswlib.cpython-313-darwin.so. The Stop hook itself exits cleanly — the mining subprocess dies detached, so Claude Code / VSCode sees a successful hook. The crash surfaces as a macOS crash report, not as a mempalace error message.Crash fingerprint (unambiguous)
The unambiguous marker is
repairConnectionsForUpdatein the crashed thread's stack — that function is only called on the HNSW update path (addItemson an already-existing label).Root cause
Three independently-reasonable design choices combine into a race:
Deterministic drawer_id (
mempalace/miner.py:377):Same file path + chunk index ⇒ same ID forever.
Upsert, not add (
mempalace/miner.py:392):Upsert on an existing ID pushes ChromaDB through
hnswlib.addItems→addPointexisting_internal_id branch →updatePoint→repairConnectionsForUpdate, which has a long-standing thread-safety bug in nmslib/hnswlib:searchBaseLayerdereferences a neighbor pointer that another worker thread is concurrently mutating.File-level mtime skip (
mempalace/miner.py:420+mempalace/palace.py:51-71):This is a file-level check, not per-chunk. A single mtime change on one file ⇒ every chunk in that file is re-upserted with its unchanged deterministic
drawer_id, producing a batch of update-path operations that ChromaDB runs concurrently through hnswlib'sstd::thread-basedParallelFor(≈hardware_concurrency()workers, ~8 on M1).Masking history
The race existed in earlier 3.x code but was masked because the mtime check was broken and
file_already_minedalways returned True. Commitc2308a1"fix: address code review — restore mtime check, bound metadata reads, harden security" in the 2026-04-09 critical-bugfixes merge (PR #399, shipped as 3.1.0) re-enabled re-mining and unmasked the crash.Reproducer
mempalace mine <dir>touch <file>orecho "x" >> <file>)mempalace mine <dir>— either manually or by letting the Claude Code Stop hook fire (ifMEMPAL_DIRis exported)~/Library/Logs/DiagnosticReports/for a crashed Python process with the fingerprint aboveOn-disk safety
The crash does not corrupt the palace index. ChromaDB has not flushed hnswlib state at the time of the segfault;
~/.mempalace/palace/<uuid>/data_level0.binand siblings stay at the last successful mine's mtime. No restore needed. Confirmed on this machine: palace dir timestamps stayed at the pre-crash mine time after the crash.Non-solutions
OMP_NUM_THREADS=1does not help. hnswlib uses its ownstd::threadpool viaParallelFor, not OpenMP. The crash stack showsstd::__1::thread::joinwith zero OpenMP frames.~/.mempalace/palaceand re-mining from scratch only buys time — the race fires on the next modified file.Proposed fix
One-hunk patch in
mempalace/miner.py::process_file, right before thefor chunk in chunksloop:This converts every re-mine into a pure INSERT path that bypasses
updatePoint/repairConnectionsForUpdateentirely. PR to follow.Related but distinct
chromadb_rust_bindings.abi3.so, a different native library with a different fingerprint. On a palace of ~10K drawers (past ChromaDB null pointer crash after ~8,400 drawers on macOS ARM64 #74's 8,400-drawer threshold), this crash fingerprint is absent, confirming ChromaDB null pointer crash after ~8,400 drawers on macOS ARM64 #74 is unrelated.link_lists.binresize bloat (disk exhaustion), different failure mode.ParallelFor.