Skip to content

Commit 39e09d5

Browse files
committed
merge: upstream/develop (2026-04-27)
22 commits from upstream/develop, including: - HNSW capacity divergence detection + BM25-only sqlite fallback when vector layer is unloadable (MemPalace#1222 / MemPalace#1227 — vector_disabled kwarg routes search through chromadb-bypass path on segfault risk). - HNSW index bloat prevention via batch_size + sync_threshold metadata on collection create (MemPalace#1191). - Hooks: always mine the active transcript as convos, additive to MEMPAL_DIR (MemPalace#1230 / MemPalace#1231). Restructured into `_get_mine_targets()` list approach + separate `_ingest_transcript()`. Daemon-strict gate preserved on each entry point. - Wing-name normalization for hyphenated dirs across miner / convo_miner / palace_graph (MemPalace#1194 / MemPalace#1195 / MemPalace#1197). - `narrow _fix_blob_seq_ids` shim + `repair --mode max-seq-id` for legacy 0.6.x BLOB-poisoned palaces (MemPalace#1135). Conflict resolution: - README.md: kept fork-shaped narrative, dropped upstream's sweep tip injection (per fork-readme handling memory). - hooks/README.md: adopted upstream's accurate `MEMPAL_DIR` additive description; kept `MEMPAL_PYTHON` env-var name (matches actual `hooks/*.sh` scripts in both forks). - mempalace/cli.py: consolidated duplicate `--mode` argparse declarations into one with all 4 choices (rebuild/legacy/reorganize/max-seq-id). - mempalace/hooks_cli.py: adopted upstream's `_get_mine_targets()` + `_ingest_transcript()` shape; added `_daemon_strict()` guard at entry of `_maybe_auto_ingest`, `_mine_sync`, and `_ingest_transcript` so the daemon-strict architecture still skips local writes. - mempalace/mcp_server.py: kept both `kind=` and `vector_disabled=` kwargs on the `search_memories` call. - mempalace/searcher.py: kept fork's `_count_in_scope`, `_sqlite_fallback_and_scope`, `_apply_kind_text_filter` AND upstream's `_bm25_only_via_sqlite`. Both `kind` and `vector_disabled` parameters on `search_memories`. Tests: 1510 passing (up from 1366 — upstream brought new test suites).
2 parents 55dbdd8 + de7801e commit 39e09d5

24 files changed

Lines changed: 2512 additions & 174 deletions

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
1919

2020
### Bug Fixes
2121

22+
- **Cross-wing topic tunnels for hyphenated dir names.** `mempalace init` recorded the `topics_by_wing` registry key under the raw directory name (e.g. `mempalace-public`), while `mempalace.yaml`'s `wing` field used the lower-cased + separator-collapsed slug (`mempalace_public`). At mine time the miner read the slug from the yaml and missed the registry, so `_compute_topic_tunnels_for_wing` returned `0` silently. Real-world: any project whose folder contained a hyphen or space lost every topic tunnel. Now both call sites route through a shared `normalize_wing_name()` in `config.py`. (#1194, follow-up to #1180)
2223
- **CLI `mempalace search` retrieval quality.** The CLI was using pure ChromaDB cosine distance with no BM25 rerank, so drawers containing every query term but embedding as noise (directory listings, diff output, shell logs) scored `Match: 0.0` alongside genuinely irrelevant results with no way to tell them apart. Wired the CLI through the same `_hybrid_rank` the `mempalace_search` MCP tool already used, and surfaced both `cosine=` and `bm25=` scores in the output so users see which component of the match is firing. MCP search was unaffected; this fixes the human-facing CLI parity gap.
2324
- **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a one-line notice pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it. (#1179)
2425
- **Graceful Ctrl-C during `mempalace mine`.** Interrupting a long mine no longer dumps a multi-frame `KeyboardInterrupt` traceback. The main file-processing loop now catches the signal, prints `files_processed: N/M`, `drawers_filed: K`, and `last_file:` so the user knows what landed, then exits with code 130 (standard SIGINT). Already-filed drawers are upserted idempotently on re-mine via deterministic IDs, so resuming is safe. The hooks PID lock at `~/.mempalace/hook_state/mine.pid` is now also actively cleaned up in a `finally` when its entry points at us — clean exit, error, or interrupt — preventing the next hook fire from briefly waiting on a stale PID. (#1182)
@@ -177,6 +178,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
177178
- Hall detection — routes drawer content to `emotions` / `technical` / `family` / `memory` / `identity` / `consciousness` / `creative` halls, enabling hall-based graph connectivity within wings (#835)
178179

179180
### Bug Fixes
181+
- Repair `max_seq_id` corruption caused by `_fix_blob_seq_ids` misinterpreting chromadb 1.5.x's sysdb-10 BLOB format (`b'\x11\x11'` + ASCII digits) as legacy 0.6.x big-endian BLOBs. The shim now skips the `max_seq_id` table entirely and guards the `embeddings` branch with a prefix check. New subcommand `mempalace repair --mode max-seq-id [--from-sidecar <path>]` restores affected palaces. Fixes silent drawer-write drops that began after chromadb 1.5.x upgrades on palaces that still had BLOB-typed `max_seq_id` rows at migration time.
180182
- Set `hnsw:space=cosine` metadata on all collection creation sites — fixes broken similarity scoring under ChromaDB's default L2 distance (#807, #218)
181183
- File-level locking prevents duplicate drawers when agents mine the same file concurrently (#784, #826)
182184
- Hybrid closet+drawer retrieval — closets boost ranking, never gate results (#795)

hooks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ Edit `mempal_save_hook.sh` to change:
9696

9797
- **`SAVE_INTERVAL=15`** — How many human messages between saves. Lower = more frequent saves, higher = less interruption.
9898
- **`STATE_DIR`** — Where hook state is stored (defaults to `~/.mempalace/hook_state/`)
99-
- **`MEMPAL_DIR`** — Optional. Set to a conversations directory to auto-run `mempalace mine <dir>` on each save trigger. Leave blank (default) to let the AI handle saving via the block reason message.
99+
- **`MEMPAL_DIR`** — Optional **project directory** (code, notes, docs) to also mine on each save trigger, with `--mode projects`. The hook ALWAYS mines the active conversation transcript automatically with `--mode convos``MEMPAL_DIR` is purely additive, never an override. Leave blank if you don't want to ingest project files.
100100
- **`MEMPAL_PYTHON`** — Optional env var. Python interpreter with mempalace + chromadb installed. Auto-detects: `MEMPAL_PYTHON` env var → repo `venv/bin/python3` → system `python3`. Set this if your venv is in a non-standard location.
101101

102102
### mempalace CLI

hooks/mempal_precompact_hook.sh

Lines changed: 55 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,17 +41,18 @@
4141
# to save everything. After the AI saves, compaction proceeds normally.
4242
#
4343
# === MEMPALACE CLI ===
44-
# This repo uses: mempalace mine <dir>
45-
# or: mempalace mine <dir> --mode convos
46-
# Set MEMPAL_DIR below if you want the hook to auto-ingest before compaction.
47-
# Leave blank to rely on the AI's own save instructions.
44+
# The hook ALWAYS mines the active conversation transcript synchronously
45+
# before compaction (via `mempalace mine <transcript-dir> --mode convos`).
46+
# MEMPAL_DIR is an *additional*, optional target for project files — it
47+
# does not replace the conversation mine.
4848

4949
STATE_DIR="$HOME/.mempalace/hook_state"
5050
mkdir -p "$STATE_DIR"
5151

52-
# Optional: set to the directory you want auto-ingested before compaction.
53-
# Example: MEMPAL_DIR="$HOME/conversations"
54-
# Leave empty to skip auto-ingest (AI handles saving via the block reason).
52+
# Optional: project directory (code / notes / docs) to also mine before
53+
# compaction. Mined with `--mode projects`. The conversation transcript
54+
# is always mined regardless — this is purely additive.
55+
# Example: MEMPAL_DIR="$HOME/projects/my_app"
5556
MEMPAL_DIR=""
5657

5758
# Resolve the Python interpreter. Same contract as mempal_save_hook.sh:
@@ -64,15 +65,57 @@ fi
6465
# Read JSON input from stdin
6566
INPUT=$(cat)
6667

67-
SESSION_ID=$(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "import sys,json; print(json.load(sys.stdin).get('session_id','unknown'))" 2>/dev/null)
68+
# Parse session_id and transcript_path in one call. Sanitize both, then
69+
# read sanitized values from one-per-line stdout into shell variables —
70+
# avoids ``eval`` on generated code (#1231 review). Same contract as
71+
# mempal_save_hook.sh.
72+
mapfile -t _mempal_parsed < <(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
73+
import sys, json, re
74+
data = json.load(sys.stdin)
75+
sid = data.get('session_id', 'unknown')
76+
tp = data.get('transcript_path', '')
77+
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
78+
print(safe(sid))
79+
print(safe(tp))
80+
" 2>/dev/null)
81+
SESSION_ID="${_mempal_parsed[0]:-unknown}"
82+
TRANSCRIPT_PATH="${_mempal_parsed[1]:-}"
83+
84+
# Expand ~ in path
85+
TRANSCRIPT_PATH="${TRANSCRIPT_PATH/#\~/$HOME}"
86+
87+
# Validate that TRANSCRIPT_PATH looks like a transcript file. Mirrors
88+
# mempalace.hooks_cli._validate_transcript_path so the shell hook
89+
# rejects the same shapes the Python hook rejects (#1231 review).
90+
is_valid_transcript_path() {
91+
local path="$1"
92+
[ -n "$path" ] || return 1
93+
case "$path" in
94+
*.json|*.jsonl) ;;
95+
*) return 1 ;;
96+
esac
97+
case "/$path/" in
98+
*/../*) return 1 ;;
99+
esac
100+
return 0
101+
}
68102

69103
echo "[$(date '+%H:%M:%S')] PRE-COMPACT triggered for session $SESSION_ID" >> "$STATE_DIR/hook.log"
70104

71-
# Optional: run mempalace ingest synchronously so memories land before compaction
105+
# Run ingest synchronously so memories land before compaction. Two
106+
# independent targets — both run if both are set:
107+
# 1. TRANSCRIPT_PATH (from Claude Code) → parent dir, --mode convos
108+
# 2. MEMPAL_DIR → --mode projects
109+
if is_valid_transcript_path "$TRANSCRIPT_PATH" && [ -f "$TRANSCRIPT_PATH" ]; then
110+
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
111+
>> "$STATE_DIR/hook.log" 2>&1
112+
elif [ -n "$TRANSCRIPT_PATH" ]; then
113+
echo "[$(date '+%H:%M:%S')] Skipping invalid transcript path: $TRANSCRIPT_PATH" \
114+
>> "$STATE_DIR/hook.log"
115+
fi
72116
if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
73-
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
74-
REPO_DIR="$(dirname "$SCRIPT_DIR")"
75-
mempalace mine "$MEMPAL_DIR" >> "$STATE_DIR/hook.log" 2>&1
117+
mempalace mine "$MEMPAL_DIR" --mode projects \
118+
>> "$STATE_DIR/hook.log" 2>&1
76119
fi
77120

78121
# Silent: return empty JSON to not block. "decision": "allow" is invalid —

hooks/mempal_save_hook.sh

Lines changed: 53 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -45,20 +45,21 @@
4545
# stop_hook_active=true so we let it through. No infinite loop.
4646
#
4747
# === MEMPALACE CLI ===
48-
# This repo uses: mempalace mine <dir>
49-
# or: mempalace mine <dir> --mode convos
50-
# Set MEMPAL_DIR below if you want the hook to auto-ingest after blocking.
51-
# Leave blank to rely on the AI's own save instructions.
48+
# The hook ALWAYS mines the active conversation transcript automatically
49+
# (via `mempalace mine <transcript-dir> --mode convos`). MEMPAL_DIR is an
50+
# *additional*, optional target for project files — it does not replace
51+
# the conversation mine.
5252
#
5353
# === CONFIGURATION ===
5454

5555
SAVE_INTERVAL=15 # Save every N human messages (adjust to taste)
5656
STATE_DIR="$HOME/.mempalace/hook_state"
5757
mkdir -p "$STATE_DIR"
5858

59-
# Optional: set to the directory you want auto-ingested on each save trigger.
60-
# Example: MEMPAL_DIR="$HOME/conversations"
61-
# Leave empty to skip auto-ingest (AI handles saving via the block reason).
59+
# Optional: project directory (code / notes / docs) to also mine each
60+
# save trigger. Mined with `--mode projects`. The conversation transcript
61+
# is always mined regardless — this is purely additive.
62+
# Example: MEMPAL_DIR="$HOME/projects/my_app"
6263
MEMPAL_DIR=""
6364

6465
# Resolve the Python interpreter the hook should use.
@@ -82,9 +83,11 @@ fi
8283
INPUT=$(cat)
8384

8485
# Parse all fields in a single Python call (3x faster than separate invocations)
85-
# SECURITY: All values are sanitized before being interpolated into shell assignments.
86-
# stop_hook_active is coerced to a strict True/False to prevent command injection via eval.
87-
eval $(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
86+
# without invoking ``eval`` on generated code: Python prints one sanitized
87+
# value per line, the shell reads them via ``mapfile`` and does plain
88+
# variable assignment — same data, smaller blast radius if the sanitizer
89+
# is ever bypassed (#1231 review).
90+
mapfile -t _mempal_parsed < <(echo "$INPUT" | "$MEMPAL_PYTHON_BIN" -c "
8891
import sys, json, re
8992
data = json.load(sys.stdin)
9093
sid = data.get('session_id', 'unknown')
@@ -94,14 +97,36 @@ tp = data.get('transcript_path', '')
9497
safe = lambda s: re.sub(r'[^a-zA-Z0-9_/.\-~]', '', str(s))
9598
# Coerce stop_hook_active to strict boolean string
9699
sha = 'True' if sha_raw is True or str(sha_raw).lower() in ('true', '1', 'yes') else 'False'
97-
print(f'SESSION_ID=\"{safe(sid)}\"')
98-
print(f'STOP_HOOK_ACTIVE=\"{sha}\"')
99-
print(f'TRANSCRIPT_PATH=\"{safe(tp)}\"')
100+
print(safe(sid))
101+
print(sha)
102+
print(safe(tp))
100103
" 2>/dev/null)
104+
SESSION_ID="${_mempal_parsed[0]:-unknown}"
105+
STOP_HOOK_ACTIVE="${_mempal_parsed[1]:-False}"
106+
TRANSCRIPT_PATH="${_mempal_parsed[2]:-}"
101107

102108
# Expand ~ in path
103109
TRANSCRIPT_PATH="${TRANSCRIPT_PATH/#\~/$HOME}"
104110

111+
# Validate that TRANSCRIPT_PATH looks like a transcript file:
112+
# - non-empty
113+
# - .jsonl or .json suffix
114+
# - no traversal segments (.. components)
115+
# Mirrors mempalace.hooks_cli._validate_transcript_path so the shell hook
116+
# rejects the same shapes the Python hook rejects (#1231 review).
117+
is_valid_transcript_path() {
118+
local path="$1"
119+
[ -n "$path" ] || return 1
120+
case "$path" in
121+
*.json|*.jsonl) ;;
122+
*) return 1 ;;
123+
esac
124+
case "/$path/" in
125+
*/../*) return 1 ;;
126+
esac
127+
return 0
128+
}
129+
105130
# If we're already in a save cycle, let the AI stop normally
106131
# This is the infinite-loop prevention: block once → AI saves → tries to stop again → we let it through
107132
if [ "$STOP_HOOK_ACTIVE" = "True" ] || [ "$STOP_HOOK_ACTIVE" = "true" ]; then
@@ -157,19 +182,23 @@ if [ "$SINCE_LAST" -ge "$SAVE_INTERVAL" ] && [ "$EXCHANGE_COUNT" -gt 0 ]; then
157182

158183
echo "[$(date '+%H:%M:%S')] TRIGGERING SAVE at exchange $EXCHANGE_COUNT" >> "$STATE_DIR/hook.log"
159184

160-
# Auto-mine the transcript. Two paths:
161-
# 1. TRANSCRIPT_PATH (from Claude Code) — mine the directory it lives in
162-
# 2. MEMPAL_DIR (user-configured) — mine that directory
163-
# At least one should work. If neither is set, nothing mines.
164-
MINE_DIR=""
165-
if [ -n "$TRANSCRIPT_PATH" ] && [ -f "$TRANSCRIPT_PATH" ]; then
166-
MINE_DIR="$(dirname "$TRANSCRIPT_PATH")"
185+
# Auto-mine. Two independent targets — both run if both are set:
186+
# 1. TRANSCRIPT_PATH (from Claude Code) → parent dir, --mode convos
187+
# (Claude Code session JSONL — must use the convo miner)
188+
# 2. MEMPAL_DIR (user-configured project) → --mode projects
189+
# (code, notes, docs)
190+
# MEMPAL_DIR is *additive*, not an override: a user with MEMPAL_DIR
191+
# pointed at their project still gets the active conversation mined.
192+
if is_valid_transcript_path "$TRANSCRIPT_PATH" && [ -f "$TRANSCRIPT_PATH" ]; then
193+
mempalace mine "$(dirname "$TRANSCRIPT_PATH")" --mode convos \
194+
>> "$STATE_DIR/hook.log" 2>&1 &
195+
elif [ -n "$TRANSCRIPT_PATH" ]; then
196+
echo "[$(date '+%H:%M:%S')] Skipping invalid transcript path: $TRANSCRIPT_PATH" \
197+
>> "$STATE_DIR/hook.log"
167198
fi
168199
if [ -n "$MEMPAL_DIR" ] && [ -d "$MEMPAL_DIR" ]; then
169-
MINE_DIR="$MEMPAL_DIR"
170-
fi
171-
if [ -n "$MINE_DIR" ]; then
172-
mempalace mine "$MINE_DIR" >> "$STATE_DIR/hook.log" 2>&1 &
200+
mempalace mine "$MEMPAL_DIR" --mode projects \
201+
>> "$STATE_DIR/hook.log" 2>&1 &
173202
fi
174203

175204
# MEMPAL_VERBOSE toggle:

0 commit comments

Comments
 (0)