feat(hooks): opt-in citation cache for source-driven-development#80
Conversation
Adds a pair of optional Claude Code hooks that cache WebFetch output on disk but revalidate every reuse against the origin. Content is served only when the server returns 304 Not Modified, so source-driven-development's "verify against current docs" guarantee still holds across sessions. - hooks/sdd-cache-pre.sh: PreToolUse hook. For a cached entry, issues a HEAD with If-None-Match / If-Modified-Since. On 304, blocks the WebFetch (exit 2) and returns cached content via stderr; otherwise allows the fetch through. - hooks/sdd-cache-post.sh: PostToolUse hook. Captures response plus current ETag / Last-Modified. Entries without a validator are never stored — without one, the pre hook cannot verify freshness and caching would amount to trusting memory. - Cache key: sha256(url + normalized_prompt). Prompt is lowercased and whitespace-collapsed so stylistic variants hit the same entry; semantically different prompts still miss. - Hard 24h TTL as a safety net against misbehaving origins. - hooks/SDD-CACHE.md: opt-in setup, end-to-end testing, debugging. - .gitignore: ignore the .claude/sdd-cache/ directory. Hooks are opt-in: users register them in .claude/settings.json. The source-driven-development skill itself is unchanged.
addyosmani
left a comment
There was a problem hiding this comment.
Awesome! Much nicer shape than #74! :) moving the caching into hooks keeps the skill itself untouched and makes the whole thing opt-in, which addresses most of what I was hesitant about before. Using HTTP validators (ETag / Last-Modified) is also a clever move, since a 304 response is basically the server telling you "my copy hasn't changed," which is more defensible than trusting a local timestamp.
Your three self-flagged concerns make sesne.
The TTL on top of validators does feel a little redundant to me - if the origin says 304, it's fresh by definition, and the TTL mostly just adds a ceiling for servers that misbehave. I'd consider dropping it, or at least making it really long (a week?) so it's truly just a safety net.
The prompt-normalization key - WebFetch runs the response through a model with that prompt, so two agents asking slightly different questions will get different outputs from the same URL, and your lowercase/whitespace normalization won't catch that. Might be worth keying on URL only and accepting that the cache hits on "the same page" rather than "the same question about the page." HEAD reliability is real but probably fine in practice - most doc sites (MDN, react.dev, caniuse) afaik behave correctly, and a missing validator just means no caching, which is the safe failure mode.
Drop the 24h TTL: HTTP validators are the whole freshness contract. Key cache on URL alone; prompt-aware keying with normalization gave false safety (semantic differences slipped through). Prompt is kept as metadata and surfaced in the hit message so the next agent can judge whether the earlier reading applies. Reframe docs around "HTTP resource cache, not prompt cache". While here, fix two latent bugs: - Replace the unquoted heredoc in the pre-hook with printf. The heredoc expanded $vars and backticks inside cached content, so a compromised doc page could trigger command substitution on cache hit. - Strip CR before awk paragraph-mode parsing of curl -I -L output so blank separators between response blocks on a redirect chain are recognised (was silently picking intermediate headers). Remove dead -v IGNORECASE=1 (gawk-only; tolower() already handles it).
|
Both right, thanks. Dropped the TTL and took prompt out of the key - the normalize trick was giving false safety anyway. Prompt now lives as metadata and shows up in the hit message so the next agent can judge if the earlier reading fits. Docs rewritten around "HTTP resource cache, not prompt cache". Two things I hit while in there:
Pushed as 4743df9. |
|
Thanks for the updates, @federicobartoli! Just to check, is the PR still considered in draft/WIP or are you at the stage where you'd like a final pass review for merge consideration? |
|
Sorry @addyosmani , my bad , I forgot to flip it out of draft after addressing your comments. Just marked it ready for review. Final pass whenever you have time 🙏 |
addyosmani
left a comment
There was a problem hiding this comment.
LGTM overall after doing another pass. Thanks, @federicobartoli!
Summary
Re-opens #74 with a completely different approach after @addyosmani's
feedback. The citation cache now lives in two optional Claude Code
hooks, not inside the skill.
hooks/sdd-cache-pre.sh— PreToolUse onWebFetch. If a cachedentry exists, it issues a
HEADwithIf-None-Match/If-Modified-Since. On304 Not Modifiedit blocks the fetch(exit 2) and returns the cached content via stderr. Otherwise it
lets the fetch through.
hooks/sdd-cache-post.sh— PostToolUse onWebFetch. Captures theresponse together with the origin's current
ETag/Last-Modified. Entries without a validator are never stored.The
source-driven-developmentskill itself is untouched.I’m considering moving this to draft because I still have a few concerns around correctness and long-term behavior.
TTL as safety net
I understand the 24h TTL is meant to mitigate misbehaving origins, but it can also introduce unnecessary invalidations. In practice, many documentation sites are relatively stable, so entries may be evicted even when the origin would still return
304 Not Modified.This means we might lose valid cache hits and fall back to a full fetch despite the content being unchanged, even though freshness is already delegated to the origin via HTTP validators. I’m wondering if relying entirely on validators (and making TTL optional or configurable) would lead to more consistent behavior.
Prompt stability as part of the cache key
Since the cache key depends on
(url + normalized_prompt), the hit rate relies on the agent reusing same prompt across sessions. Given that prompt phrasing is often not stable (even for the same intent), I’m wondering how reliable this is in real workflows, especially over time or across different agents or model versions.In my tests (using Claude Haiku), this worked quite well in practice. Once a page had been fetched (e.g.
useState), subsequent requests with similar prompts consistently hit the cache, so the agent effectively read from the cached content instead of triggering a new fetch.However, I’m not sure how stable this behavior is under prompt drift, or when switching agents, models, or longer-lived sessions.
However, I’m not sure how stable this behavior is under prompt drift, or when switching agents, models, or longer-lived sessions.
Use of HEAD for revalidation
This is the part I’m least confident about. Many servers don’t implement
HEADconsistently withGET(headers, caching behavior, etc.), which could lead to incorrect 304s or missed invalidations. Maybe a conditionalGETwould be more robust in practice, even if slightly more expensive, since it guarantees consistency with the actual fetch path.That said, since this lives entirely in hooks and doesn’t affect the skill itself, I agree the risk surface is relatively contained and it’s easy to experiment with.
More generally: should HTTP validators be treated as the single source of truth for freshness, or is it acceptable to layer additional heuristics (like TTL) on top?
How this addresses the previous feedback on #74
304. A hit is a just-completed verification, not a memory read.ETag/Last-Modified. A hard 24h TTL is the safety net for misbehaving origins.WebFetch— the canonical tool-level extension point.How it works
Cache key:
sha256(url + normalized_prompt)(lowercase + whitespacecollapse). WebFetch output is prompt-dependent, so different prompts
on the same URL are separate entries. Stylistic variants
(
"Extract the signature"vs"extract the\nsignature") hit thesame key; semantically different prompts still miss.
Full setup, testing, and debugging docs:
hooks/SDD-CACHE.md.Opt-in
Nothing happens until users explicitly register the hooks in
.claude/settings.json. The plugin manifest is not modified.Follows the same opt-in pattern used by
simplify-ignore.shin thisrepo.
Test plan
(url, prompt)on cachemiss, stores
ETag/Last-Modified.304.304(cache miss orserver change).
cache entry; semantically different prompts miss.
ETag/Last-Modifiedare skipped on write and removed on nextrevalidation attempt).
react.dev):WebFetch → cached → same prompt re-fetch →
304→ cached contentreturned to the agent.
hooks/SDD-CACHE.md.Closes #74 (once this is accepted or rejected).