Skip to content

Commit 95a5fdb

Browse files
committed
docs: add PreCompact deadlock writeup for MemPalace#863
Standalone writeup of the symptom, root cause, and fix published in MemPalace#863 (mvalentsev). Includes the session-log repro, the "decision: allow is invalid" correction from MemPalace#872, the escape hatch for anyone stuck on an unpatched version, and the rationale for removing the block entirely vs keeping it with a stateful guard. Net-new file — docs/bugfixes/ does not exist on develop yet.
1 parent 4de9e13 commit 95a5fdb

1 file changed

Lines changed: 163 additions & 0 deletions

File tree

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# PreCompact Hook Deadlock
2+
3+
**Status:** Fixed in [#863](https://github.com/MemPalace/mempalace/pull/863) on `develop`
4+
**Affects:** All MemPalace versions through v3.3.0 when used with the Claude Code harness
5+
**Related issues:** [#856](https://github.com/MemPalace/mempalace/issues/856), [#858](https://github.com/MemPalace/mempalace/issues/858), [#872](https://github.com/MemPalace/mempalace/issues/872)
6+
**Symptom:** Session appears frozen near the context limit. `/compact` has no effect. `~/.mempalace/hook_state/hook.log` shows repeated `PRE-COMPACT triggered for session …` entries, many per minute, with no intervening `Stop` hook activity.
7+
8+
## TL;DR
9+
10+
`hook_precompact()` in `mempalace/hooks_cli.py` unconditionally returned `{"decision": "block", "reason": PRECOMPACT_BLOCK_REASON}`. In the Claude Code harness, `decision: block` on a PreCompact hook **cancels the compaction** and feeds the `reason` string back to the model as an instruction. The model then tries to save memory, the response ends, and Claude Code notices the context is still over the limit — so it fires PreCompact again. The hook blocks again. The loop never terminates on its own, and manual `/compact` was also blocked because the hook ignored the `trigger` field.
11+
12+
The fix (PR #863) removes the block entirely. `hook_precompact()` now mines the transcript synchronously (so memory lands before compaction proceeds) and returns `{}` — the documented no-block pass-through in Claude Code. No state files, no trigger-field special-casing, no re-fire cycle. This also aligns `hooks_cli.py` with the standalone bash hooks under `hooks/`, which had already switched to "allow + background mine" a while back, and with the CLAUDE.md principle of "background everything — zero tokens spent on bookkeeping in the chat window."
13+
14+
## How the deadlock was observed
15+
16+
One affected session in the wild (`~/.mempalace/hook_state/hook.log`):
17+
18+
```
19+
[15:27:02] Session 082d4cc3-…: 173 exchanges, 14 since last save
20+
[16:13:45] PRE-COMPACT triggered for session 082d4cc3-…
21+
[16:16:31] PRE-COMPACT triggered for session 082d4cc3-…
22+
[16:18:56] PRE-COMPACT triggered for session 082d4cc3-…
23+
[16:21:29] PRE-COMPACT triggered for session 082d4cc3-…
24+
[16:22:05] PRE-COMPACT triggered for session 082d4cc3-…
25+
[16:23:57] PRE-COMPACT triggered for session 082d4cc3-…
26+
[16:28:05] PRE-COMPACT triggered for session 082d4cc3-…
27+
```
28+
29+
Eight PreCompact fires in 15 minutes. Zero `Stop` hook entries in between, because the session never got control back to a clean "response done" state — every time the model finished writing memory, Claude Code immediately re-attempted compaction, which immediately re-fired the hook, which immediately re-blocked.
30+
31+
Invoking `/compact` manually did not help: Claude Code sends the same `PreCompact` event for manual compactions (just with `trigger: "manual"` in the payload), and the old code didn't read that field.
32+
33+
## Root cause in the old code
34+
35+
```python
36+
# mempalace/hooks_cli.py (before the fix)
37+
def hook_precompact(data: dict, harness: str):
38+
"""Precompact hook: always block with comprehensive save instruction."""
39+
parsed = _parse_harness_input(data, harness)
40+
session_id = parsed["session_id"]
41+
42+
_log(f"PRE-COMPACT triggered for session {session_id}")
43+
44+
# ... optional auto-ingest ...
45+
46+
# Always block -- compaction = save everything
47+
_output({"decision": "block", "reason": PRECOMPACT_BLOCK_REASON})
48+
```
49+
50+
Compare to the Stop hook, which already had a loop guard (`stop_hook_active`, read from the harness payload) so that a save cycle triggered by a previous block would pass through instead of re-blocking. The PreCompact event has no equivalent `precompact_hook_active` flag, so *any* fix that kept the block would need to maintain its own cross-invocation state. #863 sidesteps this entirely by removing the block.
51+
52+
### An important aside: `"allow"` is not a valid decision value
53+
54+
One earlier iteration (see [#872](https://github.com/MemPalace/mempalace/issues/872)) returned `{"decision": "allow"}` to mean "don't block." That **happens to work, but by accident**`"block"` is the only top-level `decision` value Claude Code recognizes on this hook. Anything else (`"allow"`, `"pass"`, an unknown string, a missing key) is treated as a no-op pass-through. The documented way to not block is to return `{}`. The same correction applies to any older bash hooks that may still be using `"allow"`.
55+
56+
## The fix (#863)
57+
58+
```python
59+
def hook_precompact(data: dict, harness: str):
60+
"""Precompact hook: mine transcript synchronously, then allow compaction."""
61+
parsed = _parse_harness_input(data, harness)
62+
session_id = parsed["session_id"]
63+
transcript_path = parsed["transcript_path"]
64+
65+
_log(f"PRE-COMPACT triggered for session {session_id}")
66+
67+
# Mine synchronously so data lands before compaction proceeds
68+
_mine_sync(transcript_path)
69+
70+
_output({})
71+
```
72+
73+
With supporting helpers that also enable mining when `MEMPAL_DIR` isn't set:
74+
75+
```python
76+
def _get_mine_dir(transcript_path: str = "") -> str:
77+
"""Determine directory to mine from MEMPAL_DIR or transcript path."""
78+
mempal_dir = os.environ.get("MEMPAL_DIR", "")
79+
if mempal_dir and os.path.isdir(mempal_dir):
80+
return mempal_dir
81+
if transcript_path:
82+
path = Path(transcript_path).expanduser()
83+
if path.is_file():
84+
return str(path.parent)
85+
return ""
86+
87+
88+
def _mine_sync(transcript_path: str = ""):
89+
"""Run mempalace mine synchronously (for precompact -- data must land first)."""
90+
mine_dir = _get_mine_dir(transcript_path)
91+
if not mine_dir:
92+
return
93+
try:
94+
STATE_DIR.mkdir(parents=True, exist_ok=True)
95+
log_path = STATE_DIR / "hook.log"
96+
with open(log_path, "a") as log_f:
97+
subprocess.run(
98+
[sys.executable, "-m", "mempalace", "mine", mine_dir],
99+
stdout=log_f,
100+
stderr=log_f,
101+
timeout=60,
102+
)
103+
except (OSError, subprocess.TimeoutExpired):
104+
pass
105+
```
106+
107+
### Behavior after the fix
108+
109+
| Scenario | Before | After |
110+
| --------------------------------------------------------- | ------------------ | -------------------- |
111+
| User runs `/compact` (`trigger="manual"`) | blocked → deadlock | passes through (`{}`) |
112+
| Auto PreCompact at context threshold | blocked → loop | passes through, mines synchronously first |
113+
| Re-fire of PreCompact seconds later | blocks again → loop | already mined, passes through again |
114+
| `MEMPAL_DIR` unset | no mining at all | mines from transcript parent dir |
115+
116+
## Escape hatch for a frozen session (pre-fix)
117+
118+
If you are currently stuck in the loop on an unpatched MemPalace, the only reliable workaround is to exit the Claude Code session and start a new one with `claude --continue`. No amount of `/compact` will break the loop until `hooks_cli.py` is replaced — manual and auto compactions fire the same hook and both get blocked.
119+
120+
If you had a prior deadlock-guard variant installed (some downstream patches wrote `~/.mempalace/hook_state/{session_id}_precompact_blocked_at` sentinel files as a workaround), those files are harmless but no longer needed and can be removed:
121+
122+
```bash
123+
rm -f ~/.mempalace/hook_state/*_precompact_blocked_at
124+
```
125+
126+
## Verifying you have the fix
127+
128+
```bash
129+
# Should NOT contain `"decision": "block"` in hook_precompact
130+
grep -A 20 'def hook_precompact' "$(python3 -c 'import mempalace.hooks_cli as m; print(m.__file__)')" \
131+
| grep -q '"decision": "block"' && echo "NOT patched" || echo "patched"
132+
```
133+
134+
Or run the smoke test directly:
135+
136+
```bash
137+
echo '{"session_id":"t","transcript_path":"/nonexistent","trigger":"auto"}' \
138+
| python3 -m mempalace hook run --hook precompact --harness claude-code
139+
# Expect: {}
140+
141+
echo '{"session_id":"t","transcript_path":"/nonexistent","trigger":"manual"}' \
142+
| python3 -m mempalace hook run --hook precompact --harness claude-code
143+
# Expect: {}
144+
```
145+
146+
## Tests
147+
148+
Unit coverage in `tests/test_hooks_cli.py` (from #863):
149+
150+
- `test_precompact_allows` — precompact returns `{}`, not a block.
151+
- `test_get_mine_dir_*` — coverage for the MEMPAL_DIR → transcript-parent fallback.
152+
- `test_mine_sync_*` — synchronous mining is invoked on precompact.
153+
154+
## Why not keep the PreCompact block and just guard against the loop?
155+
156+
An earlier proposal ([#867](https://github.com/MemPalace/mempalace/pull/867)) kept the block and added a stateful deadlock guard: a per-session sentinel file that tracked the human-message count at which the last block fired, plus a `trigger == "manual"` bypass. It works, but it layers a workaround on top of a premise that doesn't hold:
157+
158+
1. The Stop hook already runs the same save logic every `SAVE_INTERVAL` human messages. By the time PreCompact fires, the session is typically already caught up to within a few turns of its last save.
159+
2. Blocking PreCompact burns a whole model turn on save instructions that the Stop hook would have issued anyway on the next response boundary.
160+
3. CLAUDE.md explicitly calls for "background everything — zero tokens spent on bookkeeping in the chat window."
161+
4. The standalone bash hooks in `hooks/` had already moved to "allow + background mine" some time ago; `hooks_cli.py` was drift.
162+
163+
Removing the block resolves all four points at once and is strictly simpler: no state files, no trigger-field special-casing, no per-session counter, no subtle re-arm logic. The synchronous mine in #863 preserves the "memory lands before compaction" guarantee that the block was reaching for, without any of the failure modes.

0 commit comments

Comments
 (0)