|
| 1 | +# PreCompact Hook Deadlock |
| 2 | + |
| 3 | +**Status:** Fixed in [#863](https://github.com/MemPalace/mempalace/pull/863) on `develop` |
| 4 | +**Affects:** All MemPalace versions through v3.3.0 when used with the Claude Code harness |
| 5 | +**Related issues:** [#856](https://github.com/MemPalace/mempalace/issues/856), [#858](https://github.com/MemPalace/mempalace/issues/858), [#872](https://github.com/MemPalace/mempalace/issues/872) |
| 6 | +**Symptom:** Session appears frozen near the context limit. `/compact` has no effect. `~/.mempalace/hook_state/hook.log` shows repeated `PRE-COMPACT triggered for session …` entries, many per minute, with no intervening `Stop` hook activity. |
| 7 | + |
| 8 | +## TL;DR |
| 9 | + |
| 10 | +`hook_precompact()` in `mempalace/hooks_cli.py` unconditionally returned `{"decision": "block", "reason": PRECOMPACT_BLOCK_REASON}`. In the Claude Code harness, `decision: block` on a PreCompact hook **cancels the compaction** and feeds the `reason` string back to the model as an instruction. The model then tries to save memory, the response ends, and Claude Code notices the context is still over the limit — so it fires PreCompact again. The hook blocks again. The loop never terminates on its own, and manual `/compact` was also blocked because the hook ignored the `trigger` field. |
| 11 | + |
| 12 | +The fix (PR #863) removes the block entirely. `hook_precompact()` now mines the transcript synchronously (so memory lands before compaction proceeds) and returns `{}` — the documented no-block pass-through in Claude Code. No state files, no trigger-field special-casing, no re-fire cycle. This also aligns `hooks_cli.py` with the standalone bash hooks under `hooks/`, which had already switched to "allow + background mine" a while back, and with the CLAUDE.md principle of "background everything — zero tokens spent on bookkeeping in the chat window." |
| 13 | + |
| 14 | +## How the deadlock was observed |
| 15 | + |
| 16 | +One affected session in the wild (`~/.mempalace/hook_state/hook.log`): |
| 17 | + |
| 18 | +``` |
| 19 | +[15:27:02] Session 082d4cc3-…: 173 exchanges, 14 since last save |
| 20 | +[16:13:45] PRE-COMPACT triggered for session 082d4cc3-… |
| 21 | +[16:16:31] PRE-COMPACT triggered for session 082d4cc3-… |
| 22 | +[16:18:56] PRE-COMPACT triggered for session 082d4cc3-… |
| 23 | +[16:21:29] PRE-COMPACT triggered for session 082d4cc3-… |
| 24 | +[16:22:05] PRE-COMPACT triggered for session 082d4cc3-… |
| 25 | +[16:23:57] PRE-COMPACT triggered for session 082d4cc3-… |
| 26 | +[16:28:05] PRE-COMPACT triggered for session 082d4cc3-… |
| 27 | +``` |
| 28 | + |
| 29 | +Eight PreCompact fires in 15 minutes. Zero `Stop` hook entries in between, because the session never got control back to a clean "response done" state — every time the model finished writing memory, Claude Code immediately re-attempted compaction, which immediately re-fired the hook, which immediately re-blocked. |
| 30 | + |
| 31 | +Invoking `/compact` manually did not help: Claude Code sends the same `PreCompact` event for manual compactions (just with `trigger: "manual"` in the payload), and the old code didn't read that field. |
| 32 | + |
| 33 | +## Root cause in the old code |
| 34 | + |
| 35 | +```python |
| 36 | +# mempalace/hooks_cli.py (before the fix) |
| 37 | +def hook_precompact(data: dict, harness: str): |
| 38 | + """Precompact hook: always block with comprehensive save instruction.""" |
| 39 | + parsed = _parse_harness_input(data, harness) |
| 40 | + session_id = parsed["session_id"] |
| 41 | + |
| 42 | + _log(f"PRE-COMPACT triggered for session {session_id}") |
| 43 | + |
| 44 | + # ... optional auto-ingest ... |
| 45 | + |
| 46 | + # Always block -- compaction = save everything |
| 47 | + _output({"decision": "block", "reason": PRECOMPACT_BLOCK_REASON}) |
| 48 | +``` |
| 49 | + |
| 50 | +Compare to the Stop hook, which already had a loop guard (`stop_hook_active`, read from the harness payload) so that a save cycle triggered by a previous block would pass through instead of re-blocking. The PreCompact event has no equivalent `precompact_hook_active` flag, so *any* fix that kept the block would need to maintain its own cross-invocation state. #863 sidesteps this entirely by removing the block. |
| 51 | + |
| 52 | +### An important aside: `"allow"` is not a valid decision value |
| 53 | + |
| 54 | +One earlier iteration (see [#872](https://github.com/MemPalace/mempalace/issues/872)) returned `{"decision": "allow"}` to mean "don't block." That **happens to work, but by accident** — `"block"` is the only top-level `decision` value Claude Code recognizes on this hook. Anything else (`"allow"`, `"pass"`, an unknown string, a missing key) is treated as a no-op pass-through. The documented way to not block is to return `{}`. The same correction applies to any older bash hooks that may still be using `"allow"`. |
| 55 | + |
| 56 | +## The fix (#863) |
| 57 | + |
| 58 | +```python |
| 59 | +def hook_precompact(data: dict, harness: str): |
| 60 | + """Precompact hook: mine transcript synchronously, then allow compaction.""" |
| 61 | + parsed = _parse_harness_input(data, harness) |
| 62 | + session_id = parsed["session_id"] |
| 63 | + transcript_path = parsed["transcript_path"] |
| 64 | + |
| 65 | + _log(f"PRE-COMPACT triggered for session {session_id}") |
| 66 | + |
| 67 | + # Mine synchronously so data lands before compaction proceeds |
| 68 | + _mine_sync(transcript_path) |
| 69 | + |
| 70 | + _output({}) |
| 71 | +``` |
| 72 | + |
| 73 | +With supporting helpers that also enable mining when `MEMPAL_DIR` isn't set: |
| 74 | + |
| 75 | +```python |
| 76 | +def _get_mine_dir(transcript_path: str = "") -> str: |
| 77 | + """Determine directory to mine from MEMPAL_DIR or transcript path.""" |
| 78 | + mempal_dir = os.environ.get("MEMPAL_DIR", "") |
| 79 | + if mempal_dir and os.path.isdir(mempal_dir): |
| 80 | + return mempal_dir |
| 81 | + if transcript_path: |
| 82 | + path = Path(transcript_path).expanduser() |
| 83 | + if path.is_file(): |
| 84 | + return str(path.parent) |
| 85 | + return "" |
| 86 | + |
| 87 | + |
| 88 | +def _mine_sync(transcript_path: str = ""): |
| 89 | + """Run mempalace mine synchronously (for precompact -- data must land first).""" |
| 90 | + mine_dir = _get_mine_dir(transcript_path) |
| 91 | + if not mine_dir: |
| 92 | + return |
| 93 | + try: |
| 94 | + STATE_DIR.mkdir(parents=True, exist_ok=True) |
| 95 | + log_path = STATE_DIR / "hook.log" |
| 96 | + with open(log_path, "a") as log_f: |
| 97 | + subprocess.run( |
| 98 | + [sys.executable, "-m", "mempalace", "mine", mine_dir], |
| 99 | + stdout=log_f, |
| 100 | + stderr=log_f, |
| 101 | + timeout=60, |
| 102 | + ) |
| 103 | + except (OSError, subprocess.TimeoutExpired): |
| 104 | + pass |
| 105 | +``` |
| 106 | + |
| 107 | +### Behavior after the fix |
| 108 | + |
| 109 | +| Scenario | Before | After | |
| 110 | +| --------------------------------------------------------- | ------------------ | -------------------- | |
| 111 | +| User runs `/compact` (`trigger="manual"`) | blocked → deadlock | passes through (`{}`) | |
| 112 | +| Auto PreCompact at context threshold | blocked → loop | passes through, mines synchronously first | |
| 113 | +| Re-fire of PreCompact seconds later | blocks again → loop | already mined, passes through again | |
| 114 | +| `MEMPAL_DIR` unset | no mining at all | mines from transcript parent dir | |
| 115 | + |
| 116 | +## Escape hatch for a frozen session (pre-fix) |
| 117 | + |
| 118 | +If you are currently stuck in the loop on an unpatched MemPalace, the only reliable workaround is to exit the Claude Code session and start a new one with `claude --continue`. No amount of `/compact` will break the loop until `hooks_cli.py` is replaced — manual and auto compactions fire the same hook and both get blocked. |
| 119 | + |
| 120 | +If you had a prior deadlock-guard variant installed (some downstream patches wrote `~/.mempalace/hook_state/{session_id}_precompact_blocked_at` sentinel files as a workaround), those files are harmless but no longer needed and can be removed: |
| 121 | + |
| 122 | +```bash |
| 123 | +rm -f ~/.mempalace/hook_state/*_precompact_blocked_at |
| 124 | +``` |
| 125 | + |
| 126 | +## Verifying you have the fix |
| 127 | + |
| 128 | +```bash |
| 129 | +# Should NOT contain `"decision": "block"` in hook_precompact |
| 130 | +grep -A 20 'def hook_precompact' "$(python3 -c 'import mempalace.hooks_cli as m; print(m.__file__)')" \ |
| 131 | + | grep -q '"decision": "block"' && echo "NOT patched" || echo "patched" |
| 132 | +``` |
| 133 | + |
| 134 | +Or run the smoke test directly: |
| 135 | + |
| 136 | +```bash |
| 137 | +echo '{"session_id":"t","transcript_path":"/nonexistent","trigger":"auto"}' \ |
| 138 | + | python3 -m mempalace hook run --hook precompact --harness claude-code |
| 139 | +# Expect: {} |
| 140 | + |
| 141 | +echo '{"session_id":"t","transcript_path":"/nonexistent","trigger":"manual"}' \ |
| 142 | + | python3 -m mempalace hook run --hook precompact --harness claude-code |
| 143 | +# Expect: {} |
| 144 | +``` |
| 145 | + |
| 146 | +## Tests |
| 147 | + |
| 148 | +Unit coverage in `tests/test_hooks_cli.py` (from #863): |
| 149 | + |
| 150 | +- `test_precompact_allows` — precompact returns `{}`, not a block. |
| 151 | +- `test_get_mine_dir_*` — coverage for the MEMPAL_DIR → transcript-parent fallback. |
| 152 | +- `test_mine_sync_*` — synchronous mining is invoked on precompact. |
| 153 | + |
| 154 | +## Why not keep the PreCompact block and just guard against the loop? |
| 155 | + |
| 156 | +An earlier proposal ([#867](https://github.com/MemPalace/mempalace/pull/867)) kept the block and added a stateful deadlock guard: a per-session sentinel file that tracked the human-message count at which the last block fired, plus a `trigger == "manual"` bypass. It works, but it layers a workaround on top of a premise that doesn't hold: |
| 157 | + |
| 158 | +1. The Stop hook already runs the same save logic every `SAVE_INTERVAL` human messages. By the time PreCompact fires, the session is typically already caught up to within a few turns of its last save. |
| 159 | +2. Blocking PreCompact burns a whole model turn on save instructions that the Stop hook would have issued anyway on the next response boundary. |
| 160 | +3. CLAUDE.md explicitly calls for "background everything — zero tokens spent on bookkeeping in the chat window." |
| 161 | +4. The standalone bash hooks in `hooks/` had already moved to "allow + background mine" some time ago; `hooks_cli.py` was drift. |
| 162 | + |
| 163 | +Removing the block resolves all four points at once and is strictly simpler: no state files, no trigger-field special-casing, no per-session counter, no subtle re-arm logic. The synchronous mine in #863 preserves the "memory lands before compaction" guarantee that the block was reaching for, without any of the failure modes. |
0 commit comments