Skip to content

refactor(searcher): hoist CLOSET_RANK_BOOSTS to module level + record ablation finding#1378

Open
jphein wants to merge 4 commits intoMemPalace:developfrom
jphein:refactor/searcher-hoist-closet-boost-constants
Open

refactor(searcher): hoist CLOSET_RANK_BOOSTS to module level + record ablation finding#1378
jphein wants to merge 4 commits intoMemPalace:developfrom
jphein:refactor/searcher-hoist-closet-boost-constants

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented May 6, 2026

Summary

  • Move CLOSET_RANK_BOOSTS and CLOSET_DISTANCE_CAP from inside search_memories to module scope so they're patchable for A/B benchmarking without touching the function.
  • Add a comment block alongside the constants recording an empirical ablation against a 151K-drawer corpus that found the closet boost is mostly inert on chat-heavy content.

Pure refactor + comment — no behavior change.

Why

The rank-based closet-boost has been the subject of #1129 (VecRecall: "organization-layer involvement in retrieval reduces R@5"). Before deciding whether to act on that critique, I ran a 12-probe A/B against my 151K-drawer canonical palace, default-vs-zeroed boosts. Findings (now captured in code):

  • Boost fires on ~20% of result rows, concentrated in queries whose answer lives in mined files; closets are sparse on chat-transcript queries (most fork-side decisions).
  • When the boost fired, it re-ordered chunks within a single source file rather than displacing right answers with wrong ones — the failure mode VecRecall describes did not reproduce on this corpus.
  • Net: the hybrid degrades to effectively pure-vector for transcript queries and re-ranks within-file chunks for mined-file queries. Neither matches the failure mode VecRecall is fixing.

The hoist itself is benign and useful — it lets future tuners swap the constants from outside the function (env var, config flag, in-process patch). The ablation comment lives next to the constants so future-us doesn't have to re-run the experiment.

Scope

  • mempalace/searcher.py: +21 / -6
  • 27/27 searcher tests still pass; full suite green locally on Linux Py3.12.
  • No new dependencies, no public API change.

Provenance

This was filed in response to @igorls's invitation in his close of `#1286` to file the closet-boost refactor as a small standalone PR. Two commits, surgical, against current develop.

🤖 Generated with Claude Code

jphein added 2 commits May 6, 2026 02:38
A/B ablation 2026-04-27 against the 151K canonical palace (12-probe set
mixing recent fork-side decisions with mined-file content). Closet boost
fires on ~20% of result rows, concentrated in queries with answers in
mined files; sparse on chat-transcript queries. When the boost fired,
it re-ordered chunks within a single source file rather than displacing
right answers with wrong ones.

VecRecall's critique (MemPalace#1129 — "organization-layer involvement in
retrieval reduces R@5") did not reproduce on this corpus. The hybrid
degrades to effectively pure-vector for transcript queries and
re-ranks within-file chunks for mined-file queries, neither of which
matches the failure mode VecRecall is fixing.

Captured as a comment next to the constants so future-us doesn't have
to re-run the experiment to learn the boost is mostly inert on
chat-heavy corpora.

Refs: scratch experiment was /tmp/closet-boost-ab.py (not committed,
re-creatable from this comment + the hoist in f558d3c).
Copilot AI review requested due to automatic review settings May 6, 2026 09:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the closet-boost tuning knobs in search_memories() by hoisting CLOSET_RANK_BOOSTS and CLOSET_DISTANCE_CAP to module scope, and adds an in-code comment capturing an empirical ablation result relevant to ongoing retrieval-quality discussion (#1129). The intent is to keep runtime behavior unchanged while making the boost parameters patchable for benchmarking/tuning.

Changes:

  • Hoist CLOSET_RANK_BOOSTS and CLOSET_DISTANCE_CAP from inside search_memories() to module-level constants.
  • Add a module-level comment block documenting an A/B ablation on a large corpus and summarizing observed impact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/searcher.py Outdated
Comment on lines +185 to +186
CLOSET_RANK_BOOSTS = [0.40, 0.25, 0.15, 0.08, 0.04]
CLOSET_DISTANCE_CAP = 1.5 # cosine dist > 1.5 = too weak to use as signal
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 1474699. CLOSET_RANK_BOOSTS is now a tuple, so a stray append/__setitem__ on the imported attribute can't silently reshape global ranking. The ablation workflow that motivated the hoist still works because it rebinds the module attribute (searcher.CLOSET_RANK_BOOSTS = (0,) * 5) rather than mutating in place — see scripts/closet_boost_ablation.py.

Comment thread mempalace/searcher.py Outdated
Comment on lines +175 to +184
# Empirical note (A/B ablation 2026-04-27 on the 151K canonical palace,
# 12-probe set covering recent fork-side work + transcript content):
# boost fires on ~20% of result rows, concentrated in queries whose
# answer lives in mined files; closets are sparse on chat-transcript
# queries (most fork-side decisions). When the boost did fire, it
# re-ordered chunks within a single source file rather than displacing
# right answers with wrong ones — i.e., VecRecall's critique
# (https://github.com/MemPalace/mempalace/discussions/1129, "org-layer
# in retrieval path drops R@5") didn't reproduce here. Kept as a
# rare-but-cheap signal; ablation script lived in /tmp, not committed.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair — fixed in 1474699 by committing scripts/closet_boost_ablation.py. Takes a palace path + optional probe set on stdin, runs the search under default boosts and zero boosts, prints a per-query delta showing whether the boost re-ordered candidates or just nudged. Lazy-imports the searcher module so --help is usable without chromadb. Comment block now points at the committed script.

…on reproducer

Two corrections from @copilot-pull-request-reviewer on MemPalace#1378:

1. CLOSET_RANK_BOOSTS was tagged "constant" but typed as a mutable
   list — a downstream import that did
   ``searcher.CLOSET_RANK_BOOSTS.append(...)`` would have silently
   reshaped global ranking. Switched to a tuple. The ablation
   workflow that motivated the hoist still works: callers patch at
   the module-attribute level (``searcher.CLOSET_RANK_BOOSTS = (...)``)
   rather than mutating in place.

2. The empirical comment claimed "ablation script lived in /tmp,
   not committed" — which made the finding unreproducible by anyone
   else. Committed scripts/closet_boost_ablation.py: takes a palace
   path + optional probe set on stdin, runs the search under default
   boosts and zero boosts, reports per-query whether the boost
   re-ordered or just nudged. Lazy imports the searcher module so
   --help works without chromadb installed.

Comment updated to point at the committed script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jphein jphein requested a review from igorls as a code owner May 6, 2026 09:56
CI lint reformat-check caught the new script. No semantic change.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented May 6, 2026

@igorls — friendly nudge: this one is small (+21 / -6 in mempalace/searcher.py and one new scripts/closet_boost_ablation.py), all 6 CI green, Copilot review responded to. Wonders whether it'd fit in the v3.3.5 batch alongside #1370 — happy to defer to v3.3.6 if you'd rather batch the changelog this round, just flagging timing since the ablation note inside it speaks to #1129 (VecRecall's R@5 critique) and a v3.3.5 inclusion would let that conversation get the empirical anchor sooner. No rush either way; thanks for the work on #1377.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants