Skip to content

fix(hooks): prevent PreCompact deadlock in Claude Code harness#867

Closed
Robins163 wants to merge 1 commit intoMemPalace:developfrom
Robins163:fix/precompact-deadlock
Closed

fix(hooks): prevent PreCompact deadlock in Claude Code harness#867
Robins163 wants to merge 1 commit intoMemPalace:developfrom
Robins163:fix/precompact-deadlock

Conversation

@Robins163
Copy link
Copy Markdown

Summary

hook_precompact in mempalace/hooks_cli.py unconditionally returned {"decision": "block", "reason": PRECOMPACT_BLOCK_REASON}. In the Claude Code harness, that cancels compaction, feeds the reason back to the model, and then — because context is still over the limit — Claude Code immediately fires PreCompact again. The hook blocks again. The session deadlocks near the context limit and cannot recover without killing the process.

Manual /compact was also caught in the same trap, because the old code ignored the trigger field that Claude Code sets to "manual" for user-initiated compactions.

Evidence from a real session

[15:27:02] Session 082d4cc3-…: 173 exchanges, 14 since last save
[16:13:45] PRE-COMPACT triggered for session 082d4cc3-…
[16:16:31] PRE-COMPACT triggered for session 082d4cc3-…
[16:18:56] PRE-COMPACT triggered for session 082d4cc3-…
[16:21:29] PRE-COMPACT triggered for session 082d4cc3-…
[16:22:05] PRE-COMPACT triggered for session 082d4cc3-…
[16:23:57] PRE-COMPACT triggered for session 082d4cc3-…
[16:28:05] PRE-COMPACT triggered for session 082d4cc3-…

Eight PreCompact fires in 15 minutes with zero Stop hook activity between them — the model never got back to a clean "response done" state because every response ended with another failed compaction attempt.

Fix

Two guards in hook_precompact():

  1. Manual trigger passes through — if data["trigger"] == "manual", return {} immediately. The user ran /compact, never block them.
  2. Per-session exchange-count guard — remember the human-message count at which we blocked in STATE_DIR/{session_id}_precompact_blocked_at. On re-fire, if the current count is still <= last_blocked_at, the save already ran. Release the state file and return {}, letting compaction proceed. A fresh user message advances the count and re-arms a single block for the next cycle.

The "force a thorough save right before detailed context is lost" guarantee is preserved — PreCompact still blocks once per new user turn. Only the infinite replay loop is removed.

Behavior matrix

Scenario Before After
User runs /compact (trigger="manual") blocked → deadlock passes through
1st auto PreCompact after save threshold crossed blocks for save blocks for save
2nd auto PreCompact, no new user message since blocks → loop passes through
Auto PreCompact after a new user message arrives blocks → loop blocks once more

Tests

tests/test_hooks_cli.py — 4 new tests, all passing. Full suite (35 tests) green:

  • test_precompact_first_fire_blocks — baseline preserved
  • test_precompact_manual_trigger_passes_through — guard 1
  • test_precompact_deadlock_guard_allows_refire — guard 2 (main regression)
  • test_precompact_new_human_message_rearms_block — guard 2 doesn't over-suppress
$ pytest tests/test_hooks_cli.py -q
...................................                                      [100%]
35 passed in 0.17s

Docs

Full writeup including escape hatch for currently-frozen sessions and a one-liner to verify whether a given install is patched:

  • docs/bugfixes/precompact-deadlock.md
  • CHANGELOG.md entry under Unreleased / v3.3.0

Test plan

  • pytest tests/test_hooks_cli.py — all pass
  • Smoke test via python3 -m mempalace hook run --hook precompact: manual → {}, 1st auto → block, 2nd auto (no new msg) → {}
  • Verified original deadlock pattern (8 fires in 15 min) is structurally impossible after fix

hook_precompact unconditionally returned decision=block, which made
Claude Code loop forever near the context limit:

  block -> model saves -> response ends -> context still full ->
  Claude Code fires PreCompact again -> block again -> ...

Observed in the wild as 8 PreCompact fires in 15 minutes on a single
session with zero Stop hook activity in between. Manual /compact also
deadlocked because the hook ignored the `trigger` field.

Two guards fix this:

1. If trigger == "manual", pass through. User /compact must never be
   blocked.
2. Per-session exchange-count guard: remember the human-message count
   at which we blocked in STATE_DIR/{session}_precompact_blocked_at.
   If PreCompact fires again with the same count, the save already
   ran — release the state and allow compaction. A fresh user turn
   re-arms a single block for the next cycle.

The "block once per new user turn to force a save" guarantee is
preserved; only the replay loop is removed.

Tests:
- test_precompact_first_fire_blocks (preserves baseline)
- test_precompact_manual_trigger_passes_through (guard 1)
- test_precompact_deadlock_guard_allows_refire (guard 2, regression)
- test_precompact_new_human_message_rearms_block (guard 2 not over-suppressing)

Full writeup in docs/bugfixes/precompact-deadlock.md.
@mvalentsev
Copy link
Copy Markdown
Contributor

Hey @Robins163, nice deadlock analysis with the session log -- that pattern is a clear repro of the same problem reported in #856 and #858.

I opened #863 for the same root cause but took a different route: instead of adding a deadlock guard around the block, I removed the block entirely. The hook now mines the transcript synchronously via subprocess (so data lands before compaction) and returns {"decision": "allow"}. No state files, no re-fire cycle.

The reasoning: the standalone bash hooks in hooks/ were already updated to use allow + background mining a while back, but hooks_cli.py (the plugin entry point) was never aligned. And CLAUDE.md says "background everything -- zero tokens spent on bookkeeping in the chat window," which argues against the first-fire block even with the deadlock guard.

With allow, manual /compact also works without a trigger check since nothing blocks in the first place.

Wanted to flag the overlap early so we don't duplicate effort.

@mvalentsev
Copy link
Copy Markdown
Contributor

Correction to my earlier comment: I said our #863 returns {"decision": "allow"} but that value turned out to be invalid (#872). Just pushed a fix -- the hook now returns {} (empty JSON), which is the documented way to not block in Claude Code. "block" is the only recognized top-level decision value; anything else is a pass-through by accident.

Same issue applies to the standalone bash hooks that both our PRs were referencing as prior art.

@Robins163
Copy link
Copy Markdown
Author

Thanks @mvalentsev — you're right, and #863 is the better fix.

Layering a stateful guard on top of a block that shouldn't exist in the first place is a workaround for a wrong premise. Removing the block aligns hooks_cli.py with the standalone bash hooks under hooks/ and with CLAUDE.md's "background everything, zero tokens spent on bookkeeping in the chat window" principle. And the {} return shape you corrected in #872 is the documented no-block path — I wasn't aware "allow" was a pass-through by accident; good catch, that's worth calling out loudly so downstream patches don't ship it.

Closing this PR in favor of #863.

One follow-up offer: my branch has a standalone writeup at docs/bugfixes/precompact-deadlock.md — session-log repro, re-fire timeline, escape hatch for anyone stuck on an unpatched version, the "allow" correction from #872, and the rationale for removing the block vs keeping it with a guard. It's independent of the code fix and docs/bugfixes/ doesn't exist on develop yet, so it's net-new. I've rebased it onto upstream/develop on branch docs/precompact-deadlock-writeup in my fork (Robins163/mempalace), referencing #863 as the fix.

Happy to either:

  1. Open a separate docs-only PR against develop once fix(hooks): stop precompact hook from blocking compaction (#856, #858) #863 lands, or
  2. Push the doc commit onto your fix/hooks-silent-save branch so it ships in the same PR.

Whichever you prefer. Tests on this branch were written against the stateful-guard semantics so they don't carry over — #863's test_precompact_allows is the right shape.

@Robins163
Copy link
Copy Markdown
Author

Closing — superseded by #863.

@Robins163 Robins163 closed this Apr 14, 2026
@mvalentsev
Copy link
Copy Markdown
Contributor

Thanks for the kind words and for closing cleanly. Your deadlock analysis with the session log was what made the problem concrete for anyone reading -- the 8-fires-in-15-minutes pattern is a better explanation than any spec quote.

A separate docs PR makes sense. The writeup stands on its own and the bugfixes/ directory is worth having as a pattern for the project. Open it against develop whenever you're ready, I'll review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants