fix: skip tool_result messages in save hook exchange count#550
fix: skip tool_result messages in save hook exchange count#550sha2fiddy wants to merge 5 commits intoMemPalace:developfrom
Conversation
|
This is a well-diagnosed fix. The 2.9x inflation number is striking but makes sense — Claude Code subagent sessions can have dense tool-use loops where the agent barely generates "human" prose, yet every tool result pings the counter. The predicate is correct. Filtering on "ALL content blocks are Dual-location fix matters. One edge case to validate: Some older transcript formats (and a few OpenAI-compatible LLMs) send On test coverage: 34 tests for a behavioral change in a save hook is on the lean side — would love to see at least one integration-style test that exercises a full Claude Code-style transcript (interleaved tool calls + real human turns) and confirms the final exchange count is correct end-to-end. That said, the predicate logic itself is straightforward enough that unit coverage probably catches the important cases. Safe no-op for OpenAI-compatible LLMs using [MemPalace-AGI integration — production stats at https://milla-jovovich.github.io/mempalace/integrations/mempalace-agi/] |
web3guru888
left a comment
There was a problem hiding this comment.
Good fix for a real problem. The 2.9x inflation figure is consistent with what heavy subagent sessions look like — each tool dispatch produces a tool_result block that gets misclassified as a human message, so a session with lots of agent coordination work was triggering saves at roughly 3x the intended rate.
The all(b.get('type') == 'tool_result' for b in content) guard is the right predicate. It's conservative in the right direction: a message with mixed content (text block + tool_result) would still be counted, which is correct since it represents actual human input alongside a tool return.
The dual-site fix (both hooks_cli.py and mempal_save_hook.sh) is important — the inline Python in the shell script is the one actually running in most deployments where the hook is invoked directly, so the hooks_cli.py fix alone wouldn't have been sufficient.
Test coverage is solid. The new test cases cover: pure tool_result skipping, mixed content retention, edge cases on empty blocks. That's exactly the right surface to test.
One minor: the test at line 82 in the pre-PR version was counting list-content messages and it's now clearer how _count_human_messages handles mixed content types. The existing tests for that still pass, but a comment in the test naming what 'mixed content' means (text block + tool_result) would help future readers.
Closes #549 cleanly. LGTM.
|
Yeah this is handled already — if isinstance(content, str):
if "<command-message>" in content:
continue
# counts as human message
elif isinstance(content, list):
if all(b.get("type") == "tool_result" for b in content):
continueString content hits the first branch and gets counted as a human message. The list iteration only runs inside |
287a955 to
e52566a
Compare
e52566a to
3f2ac0c
Compare
3f2ac0c to
4ba3829
Compare
|
Overall inflation is 600%+ for me, as the problem scales with tool and agent use. This PR greatly helps to fix this. I tested the default hook behavior ("OLD") vs PR #550 ("NEW") counter logic over my parent-session Claude Code transcripts only (depth 2, 66 sessions across 31 projects). Note that Claude Code's Stop hook only ever passes the parent transcript path and the recursive mempalace mine walk is a separate downstream step the PR doesn't touch. Testing:
I run a multi-agent dispatch workflow, so my tool_result density is at the high end. The 2.9x in the original issue report is plausibly closer to the median user. My number suggests the bug scales with tool-use intensity (and therefore teams of agents using tools). Either way the structural fix in #550 looks right patched locally and the firing rate now lines up with actual conversation cadence. The documented "hook triggers every 15 human messages" is not working as intended, its catching lots of agent and tool use messages. I'm glad to have found this PR and related issue and I was wrangling with this the past week. |
4ba3829 to
b841a28
Compare
…#549) Claude Code sends tool results as role: "user" messages with content blocks of type "tool_result". The save hook was counting these as human messages, inflating the exchange count ~2.9x in tool-heavy sessions and triggering saves far more often than the intended 15-message interval. Filter out entries where content is a list of exclusively tool_result blocks. Safe no-op for LLMs that use role: "tool" for tool results.
b841a28 to
9f35b71
Compare
|
Hi, Severity: action required | Category: correctness How to fix: Require non-empty content list Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Qodo code review - free for open-source. |
|
Fixed in ac65254. Both Regression test: |
Closes #549
Summary
role: "user"messages where all content blocks aretype: "tool_result"in_count_human_messages()(bothhooks_cli.pyand the inline Python inmempal_save_hook.sh)role: "user", inflating the exchange count ~2.9x in tool-heavy sessions — subagent-heavy work triggers saves far more often than the intended 15-message intervalrole: "tool"for tool resultsRelated
hooks_cli.pywith a different approach — this PR also fixes the inline Python inmempal_save_hook.shTest plan
test_count_skips_tool_results— verifies single and multi-block tool_result messages are skippedtest_count_mixed_content_not_skipped— verifies messages with both tool_result and text blocks still count