Skip to content

Commit 699a31a

Browse files
xcarboclaude
andcommitted
Merge upstream/develop (28 commits) into xdev-patches
Catches up on a month of upstream work. Highlights pulled in: - MemPalace#1306 searcher: hybrid candidate union (vector ∪ BM25 reranking pool) - MemPalace#1325 mcp security: omit absolute paths from tool_get_drawer / tool_status - MemPalace#1322 chroma: wire quarantine_stale_hnsw to prevent SIGSEGV on stale HNSW - MemPalace#1320 mcp: forward valid_to / source params in kg_add / kg_invalidate - MemPalace#1321 cli: honor --palace flag in cmd_init - MemPalace#1314 kg temporal params fix - MemPalace#1244 cli: cmd_compress writes to mempalace_closets so palace can read - MemPalace#1243 mcp: case-insensitive agent name in diary_write/diary_read - MemPalace#1303 mcp_server: pass embedding_function= on collection reopen - MemPalace#1076/MemPalace#1077 hooks: quote CLAUDE_PLUGIN_ROOT / CODEX_PLUGIN_ROOT in hooks.json - Various ruff format passes on touched files Conflict resolution (CHANGELOG.md only — code files all auto-merged): - 3.3.5 unreleased section header from upstream kept above 3.3.4 - 3.3.4 section: kept our 2026-04-30 release date; merged upstream's new MemPalace#1299 SIGSEGV-on-default-EF entry in alongside our existing topic-tunnels (MemPalace#1194/MemPalace#1195/MemPalace#1197), HNSW-bloat (MemPalace#1191), max_seq_id (MemPalace#1135), and auto-ingest (MemPalace#1230/MemPalace#1231) entries. Kept our richer topic-tunnels detail (upstream's version was a strict subset). xdev patches preserved (still on this branch, untouched by merge): - 6ef44cb fix(hooks): route CC transcripts via convo_miner with cwd-based wings - 3fad61d fix(config): allow leading dash in wing names Not pushed to origin — run tests locally and decide when to push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 parents df60558 + 1888b67 commit 699a31a

12 files changed

Lines changed: 970 additions & 42 deletions

File tree

.claude-plugin/hooks/hooks.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"hooks": [
77
{
88
"type": "command",
9-
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/mempal-stop-hook.sh"
9+
"command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/mempal-stop-hook.sh\""
1010
}
1111
]
1212
}
@@ -16,7 +16,7 @@
1616
"hooks": [
1717
{
1818
"type": "command",
19-
"command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/mempal-precompact-hook.sh"
19+
"command": "bash \"${CLAUDE_PLUGIN_ROOT}/hooks/mempal-precompact-hook.sh\""
2020
}
2121
]
2222
}

.codex-plugin/hooks.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"hooks": [
77
{
88
"type": "command",
9-
"command": "${CODEX_PLUGIN_ROOT}/hooks/mempal-hook.sh session-start"
9+
"command": "\"${CODEX_PLUGIN_ROOT}/hooks/mempal-hook.sh\" session-start"
1010
}
1111
]
1212
}
@@ -17,7 +17,7 @@
1717
"hooks": [
1818
{
1919
"type": "command",
20-
"command": "${CODEX_PLUGIN_ROOT}/hooks/mempal-hook.sh stop"
20+
"command": "\"${CODEX_PLUGIN_ROOT}/hooks/mempal-hook.sh\" stop"
2121
}
2222
]
2323
}
@@ -28,7 +28,7 @@
2828
"hooks": [
2929
{
3030
"type": "command",
31-
"command": "${CODEX_PLUGIN_ROOT}/hooks/mempal-hook.sh precompact"
31+
"command": "\"${CODEX_PLUGIN_ROOT}/hooks/mempal-hook.sh\" precompact"
3232
}
3333
]
3434
}

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66

77
---
88

9+
## [3.3.5] — unreleased
10+
11+
### Bug Fixes
12+
13+
- **`mempalace_diary_read` silently dropped entries on agent-name case mismatch.** `tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name`, which preserves case, while `tool_diary_read` filtered by exact match. Writing as `"Claude"` and reading as `"claude"` (or vice-versa) returned zero rows. Both endpoints now lowercase `agent_name` immediately after sanitization, so reads are case-insensitive and the default per-agent wing slug is stable across casings. **Behavior change:** entries written prior to this fix under mixed-case agent names will not match the new lowercase filter; run `mempalace repair` if you need to migrate legacy diary metadata. (#1243)
14+
15+
---
16+
917
## [3.3.4] — 2026-04-30
1018

1119
### Added
@@ -19,6 +27,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
1927

2028
### Bug Fixes
2129

30+
- **MCP server `tool_diary_write` SIGSEGV when default EF provider differs.** `mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x persists the EF *identity* (its `name()`) with the collection but not the EF *instance/configuration*, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` — its `name()` matches `mempalace.embedding`'s spoofed `"default"` so the identity check passes, but its provider list is chromadb's default rather than the user's resolved device. The miner / Stop hook ingest path routes through the backend helper and binds the configured EF instead. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon) the default provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. `_get_collection` now reuses `ChromaBackend._resolve_embedding_function()` on the reopen branches that actually open a collection (warm-cache reads stay zero-cost), matching the miner/backend path. (#1299, follow-up to #1262 / #1289)
2231
- **Cross-wing topic tunnels for hyphenated dir names.** `mempalace init` recorded the `topics_by_wing` registry key under the raw directory name (e.g. `mempalace-public`), while `mempalace.yaml`'s `wing` field used the lower-cased + separator-collapsed slug (`mempalace_public`). At mine time the miner read the slug from the yaml and missed the registry, so `_compute_topic_tunnels_for_wing` returned `0` silently. Real-world: any project whose folder contained a hyphen or space lost every topic tunnel. Producer side: `cmd_init`, `room_detector_local`, `miner.load_config` no-yaml fallback, and `convo_miner` now all route through a shared `normalize_wing_name()` in `config.py` so future writes use the same key. Lookup side: `palace_graph.create_tunnel`, `list_tunnels`, `follow_tunnels`, and `find_tunnels` normalize incoming wing names too, so existing palaces with raw-name keys on disk also recover. (#1194, #1195, #1197, follow-up to #1180)
2332
- **HNSW index bloat from repeated resize+persist cycles.** ChromaDB's HNSW segment was growing into the tens of GB on palaces past ~15K drawers because `link_lists.bin` was being re-allocated on every flush. Setting `hnsw:batch_size` and `hnsw:sync_threshold` on collection metadata via the new `_HNSW_BLOAT_GUARD` constant pins the segment to one allocation per batch instead. Empirical: a fresh 39,792-drawer palace went from 30 GB on disk and segfaulting `mempalace status` to 376 MB and instant. Migration note — already-bloated palaces still need a `mempalace repair` or full re-mine; HNSW config is honoured at collection-create time only. (#1191, supersedes #346)
2433
- **`max_seq_id` poisoning from old `_fix_blob_seq_ids` shim.** The 0.6.x → 1.5.x BLOB-to-INTEGER migration was running `int.from_bytes(blob, 'big')` over chromadb 1.5.x's native `b'\x11\x11' + ASCII-digit` `max_seq_id` format, yielding ~1.23e18 integers that silently suppressed every subsequent `embeddings_queue` write for the affected segment. The shim is now narrowed to the `embeddings` table only, with an additional defense-in-depth guard that skips sysdb-10-prefixed BLOBs even there. New `mempalace repair --mode max-seq-id` un-poisons existing palaces either from a pre-corruption sidecar DB (exact restore) or heuristically (`MAX(embeddings.seq_id)` over the owning collection). (#1135)

mempalace/backends/chroma.py

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -993,7 +993,7 @@ def _client(self, palace_path: str):
993993
)
994994

995995
if cached is None or inode_changed or mtime_changed or mtime_appeared:
996-
_fix_blob_seq_ids(palace_path)
996+
ChromaBackend._prepare_palace_for_open(palace_path)
997997
cached = chromadb.PersistentClient(path=palace_path)
998998
self._clients[palace_path] = cached
999999
# Re-stat after the client constructor runs: chromadb creates
@@ -1028,6 +1028,31 @@ def _client(self, palace_path: str):
10281028
# safety property; locking would add cost without correctness gain.
10291029
_quarantined_paths: set[str] = set()
10301030

1031+
@staticmethod
1032+
def _prepare_palace_for_open(palace_path: str) -> None:
1033+
"""Run the pre-open safety pass shared by :meth:`make_client` and
1034+
:meth:`_client`.
1035+
1036+
Two steps, both required before constructing a ``PersistentClient``:
1037+
1038+
1. ``_fix_blob_seq_ids`` — repairs the BLOB seq_id quirk that bites
1039+
certain chromadb migrations.
1040+
2. ``quarantine_stale_hnsw`` — gated by :attr:`_quarantined_paths` so
1041+
it fires once per palace per process. This is the SIGSEGV
1042+
prevention path for stale HNSW segments (see #1121, #1132, #1263);
1043+
wiring it through this helper means CLI mining, search, repair,
1044+
and status all benefit, not just the legacy ``make_client``
1045+
callers.
1046+
1047+
Idempotent: safe to call from any code path that is about to open or
1048+
re-open a palace. The ``_quarantined_paths`` gate prevents thrash on
1049+
hot paths (e.g. ``_client()`` is called on every backend operation).
1050+
"""
1051+
_fix_blob_seq_ids(palace_path)
1052+
if palace_path not in ChromaBackend._quarantined_paths:
1053+
quarantine_stale_hnsw(palace_path)
1054+
ChromaBackend._quarantined_paths.add(palace_path)
1055+
10311056
@staticmethod
10321057
def make_client(palace_path: str):
10331058
"""Create a fresh ``PersistentClient`` (fixes BLOB seq_ids first).
@@ -1040,10 +1065,7 @@ def make_client(palace_path: str):
10401065
:attr:`_quarantined_paths` for the rationale (cold-start protection
10411066
vs. runtime thrash on steady-write daemons).
10421067
"""
1043-
_fix_blob_seq_ids(palace_path)
1044-
if palace_path not in ChromaBackend._quarantined_paths:
1045-
quarantine_stale_hnsw(palace_path)
1046-
ChromaBackend._quarantined_paths.add(palace_path)
1068+
ChromaBackend._prepare_palace_for_open(palace_path)
10471069
return chromadb.PersistentClient(path=palace_path)
10481070

10491071
@staticmethod

mempalace/cli.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,13 @@ def cmd_init(args):
232232
from .project_scanner import discover_entities
233233
from .room_detector_local import detect_rooms_local
234234

235+
# Honor --palace (issue #1313): without this, init silently ignored the
236+
# flag and always used ~/.mempalace. Mirror the env-var pattern used by
237+
# mcp_server.py so every downstream read of ``cfg.palace_path`` (Pass 0,
238+
# cfg.init(), the post-init mine) routes to the user-specified location.
239+
if getattr(args, "palace", None):
240+
os.environ["MEMPALACE_PALACE_PATH"] = os.path.abspath(os.path.expanduser(args.palace))
241+
235242
cfg = MempalaceConfig()
236243

237244
# Resolve entity-detection languages: --lang overrides config.
@@ -310,8 +317,7 @@ def cmd_init(args):
310317
)
311318
except LLMError as e:
312319
print(
313-
f" LLM init failed ({e}). "
314-
f"Running heuristics-only — pass --no-llm to silence this."
320+
f" LLM init failed ({e}). Running heuristics-only — pass --no-llm to silence this."
315321
)
316322

317323
# Pass 0: detect whether the corpus is AI-dialogue. Writes
@@ -902,7 +908,7 @@ def cmd_compress(args):
902908
# Store compressed versions (unless dry-run)
903909
if not args.dry_run:
904910
try:
905-
comp_col = backend.get_or_create_collection(palace_path, "mempalace_compressed")
911+
comp_col = backend.get_or_create_collection(palace_path, "mempalace_closets")
906912
for doc_id, compressed, meta, stats in compressed_entries:
907913
comp_meta = dict(meta)
908914
comp_meta["compression_ratio"] = round(stats["size_ratio"], 1)
@@ -913,7 +919,7 @@ def cmd_compress(args):
913919
metadatas=[comp_meta],
914920
)
915921
print(
916-
f" Stored {len(compressed_entries)} compressed drawers in 'mempalace_compressed' collection."
922+
f" Stored {len(compressed_entries)} compressed drawers in 'mempalace_closets' collection."
917923
)
918924
except Exception as e:
919925
print(f" Error storing compressed drawers: {e}")

0 commit comments

Comments
 (0)