refactor(searcher): hoist CLOSET_RANK_BOOSTS to module level + record ablation finding by jphein · Pull Request #1378 · MemPalace/mempalace

jphein · 2026-05-06T09:39:45Z

Summary

Move CLOSET_RANK_BOOSTS and CLOSET_DISTANCE_CAP from inside search_memories to module scope so they're patchable for A/B benchmarking without touching the function.
Add a comment block alongside the constants recording an empirical ablation against a 151K-drawer corpus that found the closet boost is mostly inert on chat-heavy content.

Pure refactor + comment — no behavior change.

Why

The rank-based closet-boost has been the subject of #1129 (VecRecall: "organization-layer involvement in retrieval reduces R@5"). Before deciding whether to act on that critique, I ran a 12-probe A/B against my 151K-drawer canonical palace, default-vs-zeroed boosts. Findings (now captured in code):

Boost fires on ~20% of result rows, concentrated in queries whose answer lives in mined files; closets are sparse on chat-transcript queries (most fork-side decisions).
When the boost fired, it re-ordered chunks within a single source file rather than displacing right answers with wrong ones — the failure mode VecRecall describes did not reproduce on this corpus.
Net: the hybrid degrades to effectively pure-vector for transcript queries and re-ranks within-file chunks for mined-file queries. Neither matches the failure mode VecRecall is fixing.

The hoist itself is benign and useful — it lets future tuners swap the constants from outside the function (env var, config flag, in-process patch). The ablation comment lives next to the constants so future-us doesn't have to re-run the experiment.

Scope

mempalace/searcher.py: +21 / -6
27/27 searcher tests still pass; full suite green locally on Linux Py3.12.
No new dependencies, no public API change.

Provenance

This was filed in response to @igorls's invitation in his close of `#1286` to file the closet-boost refactor as a small standalone PR. Two commits, surgical, against current develop.

🤖 Generated with Claude Code

…tion

A/B ablation 2026-04-27 against the 151K canonical palace (12-probe set mixing recent fork-side decisions with mined-file content). Closet boost fires on ~20% of result rows, concentrated in queries with answers in mined files; sparse on chat-transcript queries. When the boost fired, it re-ordered chunks within a single source file rather than displacing right answers with wrong ones. VecRecall's critique (MemPalace#1129 — "organization-layer involvement in retrieval reduces R@5") did not reproduce on this corpus. The hybrid degrades to effectively pure-vector for transcript queries and re-ranks within-file chunks for mined-file queries, neither of which matches the failure mode VecRecall is fixing. Captured as a comment next to the constants so future-us doesn't have to re-run the experiment to learn the boost is mostly inert on chat-heavy corpora. Refs: scratch experiment was /tmp/closet-boost-ab.py (not committed, re-creatable from this comment + the hoist in f558d3c).

Copilot

Pull request overview

This PR refactors the closet-boost tuning knobs in search_memories() by hoisting CLOSET_RANK_BOOSTS and CLOSET_DISTANCE_CAP to module scope, and adds an in-code comment capturing an empirical ablation result relevant to ongoing retrieval-quality discussion (#1129). The intent is to keep runtime behavior unchanged while making the boost parameters patchable for benchmarking/tuning.

Changes:

Hoist CLOSET_RANK_BOOSTS and CLOSET_DISTANCE_CAP from inside search_memories() to module-level constants.
Add a module-level comment block documenting an A/B ablation on a large corpus and summarizing observed impact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jphein · 2026-05-06T09:56:28Z

+CLOSET_RANK_BOOSTS = [0.40, 0.25, 0.15, 0.08, 0.04]
+CLOSET_DISTANCE_CAP = 1.5  # cosine dist > 1.5 = too weak to use as signal


Good catch — fixed in 1474699. CLOSET_RANK_BOOSTS is now a tuple, so a stray append/__setitem__ on the imported attribute can't silently reshape global ranking. The ablation workflow that motivated the hoist still works because it rebinds the module attribute (searcher.CLOSET_RANK_BOOSTS = (0,) * 5) rather than mutating in place — see scripts/closet_boost_ablation.py.

jphein · 2026-05-06T09:56:29Z

+# Empirical note (A/B ablation 2026-04-27 on the 151K canonical palace,
+# 12-probe set covering recent fork-side work + transcript content):
+# boost fires on ~20% of result rows, concentrated in queries whose
+# answer lives in mined files; closets are sparse on chat-transcript
+# queries (most fork-side decisions). When the boost did fire, it
+# re-ordered chunks within a single source file rather than displacing
+# right answers with wrong ones — i.e., VecRecall's critique
+# (https://github.com/MemPalace/mempalace/discussions/1129, "org-layer
+# in retrieval path drops R@5") didn't reproduce here. Kept as a
+# rare-but-cheap signal; ablation script lived in /tmp, not committed.


Fair — fixed in 1474699 by committing scripts/closet_boost_ablation.py. Takes a palace path + optional probe set on stdin, runs the search under default boosts and zero boosts, prints a per-query delta showing whether the boost re-ordered candidates or just nudged. Lazy-imports the searcher module so --help is usable without chromadb. Comment block now points at the committed script.

…on reproducer Two corrections from @copilot-pull-request-reviewer on MemPalace#1378: 1. CLOSET_RANK_BOOSTS was tagged "constant" but typed as a mutable list — a downstream import that did ``searcher.CLOSET_RANK_BOOSTS.append(...)`` would have silently reshaped global ranking. Switched to a tuple. The ablation workflow that motivated the hoist still works: callers patch at the module-attribute level (``searcher.CLOSET_RANK_BOOSTS = (...)``) rather than mutating in place. 2. The empirical comment claimed "ablation script lived in /tmp, not committed" — which made the finding unreproducible by anyone else. Committed scripts/closet_boost_ablation.py: takes a palace path + optional probe set on stdin, runs the search under default boosts and zero boosts, reports per-query whether the boost re-ordered or just nudged. Lazy imports the searcher module so --help works without chromadb installed. Comment updated to point at the committed script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI lint reformat-check caught the new script. No semantic change.

jphein · 2026-05-06T10:36:14Z

@igorls — friendly nudge: this one is small (+21 / -6 in mempalace/searcher.py and one new scripts/closet_boost_ablation.py), all 6 CI green, Copilot review responded to. Wonders whether it'd fit in the v3.3.5 batch alongside #1370 — happy to defer to v3.3.6 if you'd rather batch the changelog this round, just flagging timing since the ablation note inside it speaks to #1129 (VecRecall's R@5 critique) and a v3.3.5 inclusion would let that conversation get the empirical anchor sooner. No rush either way; thanks for the work on #1377.

jphein added 2 commits May 6, 2026 02:38

refactor(searcher): hoist CLOSET_RANK_BOOSTS to module level for abla…

c42f09f

…tion

Copilot AI review requested due to automatic review settings May 6, 2026 09:39

jphein requested review from bensig and milla-jovovich as code owners May 6, 2026 09:39

Copilot started reviewing on behalf of jphein May 6, 2026 09:40 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

jphein requested a review from igorls as a code owner May 6, 2026 09:56

style: ruff format scripts/closet_boost_ablation.py

1fe95f9

CI lint reformat-check caught the new script. No semantic change.

jphein mentioned this pull request May 6, 2026

feat: configurable chunk_size, chunk_overlap, min_chunk_size #1024

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(searcher): hoist CLOSET_RANK_BOOSTS to module level + record ablation finding#1378

refactor(searcher): hoist CLOSET_RANK_BOOSTS to module level + record ablation finding#1378
jphein wants to merge 4 commits intoMemPalace:developfrom
jphein:refactor/searcher-hoist-closet-boost-constants

jphein commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jphein May 6, 2026

Uh oh!

jphein May 6, 2026

Uh oh!

jphein commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		CLOSET_RANK_BOOSTS = [0.40, 0.25, 0.15, 0.08, 0.04]
		CLOSET_DISTANCE_CAP = 1.5 # cosine dist > 1.5 = too weak to use as signal

Conversation

jphein commented May 6, 2026

Summary

Why

Scope

Provenance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jphein May 6, 2026

Choose a reason for hiding this comment

Uh oh!

jphein May 6, 2026

Choose a reason for hiding this comment

Uh oh!

jphein commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants