fix(hooks): prevent PreCompact deadlock in Claude Code harness#867
fix(hooks): prevent PreCompact deadlock in Claude Code harness#867Robins163 wants to merge 1 commit intoMemPalace:developfrom
Conversation
hook_precompact unconditionally returned decision=block, which made
Claude Code loop forever near the context limit:
block -> model saves -> response ends -> context still full ->
Claude Code fires PreCompact again -> block again -> ...
Observed in the wild as 8 PreCompact fires in 15 minutes on a single
session with zero Stop hook activity in between. Manual /compact also
deadlocked because the hook ignored the `trigger` field.
Two guards fix this:
1. If trigger == "manual", pass through. User /compact must never be
blocked.
2. Per-session exchange-count guard: remember the human-message count
at which we blocked in STATE_DIR/{session}_precompact_blocked_at.
If PreCompact fires again with the same count, the save already
ran — release the state and allow compaction. A fresh user turn
re-arms a single block for the next cycle.
The "block once per new user turn to force a save" guarantee is
preserved; only the replay loop is removed.
Tests:
- test_precompact_first_fire_blocks (preserves baseline)
- test_precompact_manual_trigger_passes_through (guard 1)
- test_precompact_deadlock_guard_allows_refire (guard 2, regression)
- test_precompact_new_human_message_rearms_block (guard 2 not over-suppressing)
Full writeup in docs/bugfixes/precompact-deadlock.md.
|
Hey @Robins163, nice deadlock analysis with the session log -- that pattern is a clear repro of the same problem reported in #856 and #858. I opened #863 for the same root cause but took a different route: instead of adding a deadlock guard around the block, I removed the block entirely. The hook now mines the transcript synchronously via subprocess (so data lands before compaction) and returns The reasoning: the standalone bash hooks in With Wanted to flag the overlap early so we don't duplicate effort. |
|
Correction to my earlier comment: I said our #863 returns Same issue applies to the standalone bash hooks that both our PRs were referencing as prior art. |
|
Thanks @mvalentsev — you're right, and #863 is the better fix. Layering a stateful guard on top of a block that shouldn't exist in the first place is a workaround for a wrong premise. Removing the block aligns Closing this PR in favor of #863. One follow-up offer: my branch has a standalone writeup at Happy to either:
Whichever you prefer. Tests on this branch were written against the stateful-guard semantics so they don't carry over — #863's |
|
Closing — superseded by #863. |
|
Thanks for the kind words and for closing cleanly. Your deadlock analysis with the session log was what made the problem concrete for anyone reading -- the 8-fires-in-15-minutes pattern is a better explanation than any spec quote. A separate docs PR makes sense. The writeup stands on its own and the bugfixes/ directory is worth having as a pattern for the project. Open it against develop whenever you're ready, I'll review. |
Summary
hook_precompactinmempalace/hooks_cli.pyunconditionally returned{"decision": "block", "reason": PRECOMPACT_BLOCK_REASON}. In the Claude Code harness, that cancels compaction, feeds thereasonback to the model, and then — because context is still over the limit — Claude Code immediately fires PreCompact again. The hook blocks again. The session deadlocks near the context limit and cannot recover without killing the process.Manual
/compactwas also caught in the same trap, because the old code ignored thetriggerfield that Claude Code sets to"manual"for user-initiated compactions.Evidence from a real session
Eight PreCompact fires in 15 minutes with zero
Stophook activity between them — the model never got back to a clean "response done" state because every response ended with another failed compaction attempt.Fix
Two guards in
hook_precompact():data["trigger"] == "manual", return{}immediately. The user ran/compact, never block them.STATE_DIR/{session_id}_precompact_blocked_at. On re-fire, if the current count is still<= last_blocked_at, the save already ran. Release the state file and return{}, letting compaction proceed. A fresh user message advances the count and re-arms a single block for the next cycle.The "force a thorough save right before detailed context is lost" guarantee is preserved — PreCompact still blocks once per new user turn. Only the infinite replay loop is removed.
Behavior matrix
/compact(trigger="manual")Tests
tests/test_hooks_cli.py— 4 new tests, all passing. Full suite (35 tests) green:test_precompact_first_fire_blocks— baseline preservedtest_precompact_manual_trigger_passes_through— guard 1test_precompact_deadlock_guard_allows_refire— guard 2 (main regression)test_precompact_new_human_message_rearms_block— guard 2 doesn't over-suppressDocs
Full writeup including escape hatch for currently-frozen sessions and a one-liner to verify whether a given install is patched:
docs/bugfixes/precompact-deadlock.mdCHANGELOG.mdentry under Unreleased / v3.3.0Test plan
pytest tests/test_hooks_cli.py— all passpython3 -m mempalace hook run --hook precompact: manual →{}, 1st auto → block, 2nd auto (no new msg) →{}