Skip to content

feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers#1005

Open
jphein wants to merge 2 commits intoMemPalace:developfrom
jphein:feat/search-warnings-sqlite-fallback
Open

feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers#1005
jphein wants to merge 2 commits intoMemPalace:developfrom
jphein:feat/search-warnings-sqlite-fallback

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 18, 2026

Summary

When ChromaDB's HNSW index is sparse, drifted, or rejects a filter (#951's Error finding id), today's search_memories returns fewer hits than the palace actually contains — silently. A user querying for drawers that exist in sqlite sees an empty or partial result and concludes the palace has forgotten, when only the vector ranking layer is degraded.

Silent hit-miss is worse than a crash because callers can't detect it. This is especially painful:

Contract change

search_memories() no longer hard-fails on vector errors — it always returns a result (except "error": "No palace found"). The return dict gains two fields:

"warnings": ["vector search unavailable: ...", ...]  # why we couldn't get more
"available_in_scope": 5243                            # sqlite-authoritative scope count

Behavior:

  1. Run the vector query. On exception, append a warning ("vector search unavailable: <err>") and continue — do not raise.
  2. After hybrid ranking, count sqlite drawers matching the scope (wing/room filter, or total). That's the ceiling of what should have been returned.
  3. If len(hits) < n_results and the caller did NOT set max_distance (strict-similarity mode), top up from the sqlite pool via BM25 keyword ranking.
    • Fallback hits are tagged matched_via="sqlite_bm25_fallback".
    • distance/similarity are None since there's no vector score.
    • Drawers with zero query-term overlap are skipped — no padding with arbitrary content.
  4. If scope > returned, add a warning pointing at mempalace repair.

CLI search() now delegates to search_memories so both paths share the fallback and warning surface. Warnings print with a ! prefix above the results:

============================================================
  Results for: "kiyo xhci"
  Wing: kiyo-xhci-fix
  Scope has: 5243 drawers matching filter
  ! vector search returned 0 of 5 requested; filled 5 from sqlite+BM25 keyword match
============================================================

Relation to #951 and #823

Complexity

Ruff flagged search_memories (C901) after the additions. Extracted the fallback + scope-count into _sqlite_fallback_and_scope() to stay under threshold without losing any local context.

Tests

  • test_search_memories_fills_from_sqlite_when_vector_underdelivers — mocks vector returning 1 hit, sqlite having 4 drawers in scope (2 with query-term matches, 1 without); asserts 2 fallback hits get promoted via BM25 and the one with zero matches is skipped; asserts available_in_scope == 4 and the warning text.
  • test_search_memories_query_error_degrades_to_warning — vector raises, warning surfaced, no hard failure.
  • test_search_query_error_degrades_to_warning — CLI no longer raises on vector failure, prints the warning.

4 prior tests updated for the new contract (scope count, warnings list, no SearchError on pure query failure).

  • pytest tests/974 passed
  • ruff check / ruff format --check — clean

Scope / non-goals

Copilot AI review requested due to automatic review settings April 18, 2026 18:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates MemPalace search behavior to avoid silent underdelivery when ChromaDB’s vector index fails or returns too few hits by surfacing warnings and optionally topping up results from SQLite using BM25 keyword ranking. It also routes the CLI search() through search_memories() so both programmatic and CLI callers share the same fallback/warning behavior.

Changes:

  • Make search_memories() degrade vector query failures into "warnings" instead of returning an "error" (except for “No palace found”).
  • Add SQLite scope accounting + BM25 top-up fallback when vector results underdeliver and max_distance filtering is disabled.
  • Update CLI search() to delegate to search_memories() and print warnings/scope information; update tests for the new contract.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
mempalace/searcher.py Adds warnings + SQLite/BM25 fallback logic and unifies CLI and API search behavior via search_memories().
tests/test_searcher.py Updates/extends tests to validate warning surfacing and fallback behavior instead of hard failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/searcher.py Outdated
Comment on lines +319 to +333
pool_kwargs: dict = {"include": ["documents", "metadatas"]}
if where:
pool_kwargs["where"] = where
# Cap the pool — sqlite-variable limits (#950) bite around 32k, and
# a fallback pool 20x the request is plenty for BM25 ranking.
pool_kwargs["limit"] = max(n_results * 20, 100)
pool = drawers_col.get(**pool_kwargs)
except Exception as e:
warnings.append(f"sqlite fallback unavailable: {e}")
return available_in_scope, warnings

pool_docs = pool.get("documents") or []
pool_metas = pool.get("metadatas") or []
available_in_scope = len(pool_docs)

Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

available_in_scope is computed as len(pool_docs) after a get(limit=max(n_results*20, 100)), so it will never exceed that limit and is not an authoritative “scope count”. This makes the CLI scope line and the later “unreachable via HNSW” warning potentially misleading on large palaces. Consider computing the full in-scope count separately (e.g., drawers_col.count(where=where) if safe, or paginating get(include=["ids"], limit=..., offset=...) to count) and keep the limited get(include=["documents","metadatas"]) only for BM25 fallback candidates.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit d0f4e32 (squashed rebase of the original b4cbdfd). _count_in_scope() now paginates col.get(include=[], limit=5000, offset=K) to compute an authoritative scope total — independent of the (capped) fallback pool — so available_in_scope reflects sqlite truth, not the pool ceiling. Pagination keeps each query well under #950's SQL-variable limit.

Comment thread mempalace/searcher.py Outdated
Comment on lines +603 to +609
# Surface unreachable data: scope has more matches than we could rank.
if available_in_scope is not None and available_in_scope > len(hits) and len(hits) < n_results:
warnings.append(
f"{available_in_scope} drawers match this scope in sqlite; "
f"{len(hits)} ranked — the rest are unreachable via the current "
f"HNSW index. Run `mempalace repair` to rebuild."
)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Run mempalace repair” warning is currently gated on len(hits) < n_results. If the sqlite BM25 fallback fills the request to n_results, this warning won’t surface even when available_in_scope > len(hits) (i.e., lots of drawers exist in-scope but couldn’t be vector-ranked). If the intended contract is “warn whenever scope > returned”, drop the len(hits) < n_results condition (or otherwise key it off vector underdelivery rather than the final hit count).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d0f4e32. The "Run `mempalace repair`" warning now gates off a vector_underdelivered flag captured before the BM25 fallback runs, instead of len(hits) < n_results. So when sqlite top-up papers over a degraded vector path, the user still sees that the vector index is the under-delivering layer.

Comment thread mempalace/searcher.py Outdated
Comment on lines +249 to +252
print(f"\n {result['error']}")
if "hint" in result:
print(f" {result['hint']}")
raise SearchError(result["error"])
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI search() used to print the missing-palace path (No palace found at <path>). After delegating to search_memories(), it now prints only No palace found (no path), which is less actionable when multiple palaces/paths are involved. Consider including palace_path in the printed message (either by enriching the search_memories error payload or formatting it in search()).

Suggested change
print(f"\n {result['error']}")
if "hint" in result:
print(f" {result['hint']}")
raise SearchError(result["error"])
error_message = result["error"]
if error_message == "No palace found":
error_message = f"{error_message} at {palace_path}"
print(f"\n {error_message}")
if "hint" in result:
print(f" {result['hint']}")
raise SearchError(error_message)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d0f4e32. CLI search() now preserves the palace path explicitly when printing the error: structured payload from search_memories stays path-agnostic (right for the API contract), but the CLI prepends at <palace_path> so users still see which palace failed to open.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Three corrections from @copilot-pull-request-reviewer:

1. available_in_scope was capped at n_results*20 via the pool get(),
   so the "sqlite-authoritative" framing was misleading on large
   palaces (would report at most 100 even when 5000+ drawers matched).
   Split the count out of the pool fetch: new _count_in_scope()
   paginates col.get(include=[], limit=5000, offset=K) to sum an
   authoritative total. Pagination keeps each query well under
   MemPalace#950's SQL-variable limit.

2. The "Run mempalace repair" warning was gated on
   len(hits) < n_results, so it never surfaced when the BM25 fallback
   filled the request — even though the vector index was the actual
   degraded layer. Added a vector_underdelivered flag captured before
   fallback runs; the warning now gates off that flag so users see
   "HNSW can't reach the rest" whenever the vector path fell short,
   regardless of how much BM25 papered over.

3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor
   to delegate through search_memories(). Preserve it explicitly in
   search() — the structured error payload stays path-agnostic, but
   the CLI adds "at <palace_path>" when printing so users still see
   which palace failed to open.

Also updated _sqlite_fallback_and_scope() signature to take
vector_underdelivered separately from allow_fallback, matching what
the two warnings now need to distinguish.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Three corrections from @copilot-pull-request-reviewer:

1. available_in_scope was capped at n_results*20 via the pool get(),
   so the "sqlite-authoritative" framing was misleading on large
   palaces (would report at most 100 even when 5000+ drawers matched).
   Split the count out of the pool fetch: new _count_in_scope()
   paginates col.get(include=[], limit=5000, offset=K) to sum an
   authoritative total. Pagination keeps each query well under
   MemPalace#950's SQL-variable limit.

2. The "Run mempalace repair" warning was gated on
   len(hits) < n_results, so it never surfaced when the BM25 fallback
   filled the request — even though the vector index was the actual
   degraded layer. Added a vector_underdelivered flag captured before
   fallback runs; the warning now gates off that flag so users see
   "HNSW can't reach the rest" whenever the vector path fell short,
   regardless of how much BM25 papered over.

3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor
   to delegate through search_memories(). Preserve it explicitly in
   search() — the structured error payload stays path-agnostic, but
   the CLI adds "at <palace_path>" when printing so users still see
   which palace failed to open.

Also updated _sqlite_fallback_and_scope() signature to take
vector_underdelivered separately from allow_fallback, matching what
the two warnings now need to distinguish.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
README:
- Fork-changes table: expand the None-metadata row to cover all 8 sites
  (searcher.py CLI + API + closet-boost, miner.status, 4 mcp_server
  handlers). Previous row only called out the CLI print path.
- Add a new Search row: warnings + sqlite BM25 top-up contract (the
  "never silent miss" feature) with pointer to MemPalace#951 + MemPalace#823.
- Open-PR table: expand MemPalace#999 scope line to mention 8 sites + architectural
  note, update MemPalace#1000 to reflect post-MemPalace#995 rebase, add MemPalace#1005 with Copilot
  fixes noted.

CLAUDE.md:
- PR status header: 7 open -> 8 open (adds MemPalace#1005).
- Same PR row updates as README for MemPalace#999/MemPalace#1000/MemPalace#1005.
- Fork Changes list: expand entry 11 (None guards) to 8 sites + adapter
  consolidation proposal on MemPalace#999; add entry 14 for the warnings+BM25
  feature; keep 12 and 13 as-is.

42 README-claim tests still pass.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Concrete evidence:
- New "What this looks like in practice" section right after the
  status line, showing a real stop-hook systemMessage output and a
  real mempalace_search response shape (warnings, available_in_scope,
  matched_via tags, similarity scores). Someone evaluating the fork
  can see what "running in production" actually looks like.

Headlines box:
- New "Headlines" subsection at the top of Fork Changes with the
  three differentiators someone should know if they only read one
  section — silent-save hook, ChromaDB 1.5.x hardening (quarantine +
  None guards), search-never-misses contract. Links to MemPalace#673/MemPalace#999/
  MemPalace#1000/MemPalace#1005 so readers can jump to the work itself.

Citations for comparison table:
- Every row now links to its upstream repo: Hindsight (vectorize-io),
  Mem0 + OpenMemory (mem0ai), Cognee (topoteretes), Letta (letta-ai),
  engram (NickCirv), CaviraOSS OpenMemory. Cognee row updated since
  they've added MCP support since we first wrote the row.
- Replaced the "Systems mentioned without captured primary URLs"
  footnote (now stale since we have them) with a "Verification note"
  that's honest about the point-in-time nature of these columns and
  explains why TagMem is absent.

Structure cleanups:
- Removed the upstream MemPalace logo at the top — it's milla-
  jovovich's asset and using it in a fork README is awkward.
- Renamed "Roadmap" → "Planned work" — the section is decisive with
  priorities and time estimates, "Roadmap" was underselling it.

No content removed from the document beyond the stale footnote and
the upstream logo. All existing sections intact. 42 README-claim
tests pass.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 18, 2026
Every remaining row in "Still ahead of upstream" now carries a status
so the reader can tell at a glance whether each change is being
upstreamed, pending a PR, or deliberately fork-local.

Dropped:
- "Guard ChromaDB 1.5.x metadata-mismatch segfault" — this row was
  overstated. The memory file for today's debugging notes that the
  try-get/except-create pattern is defensive code that never
  reproduced a specific crash (the actual crashes traced to HNSW
  drift). Leaving it in "Still ahead" implied an upstream-candidate
  fix, which it isn't. Code stays in place as defensive, but the
  README no longer claims it as a fork-ahead feature.

Moved to Superseded:
- "Stale HNSW mtime detection + mempalace_reconnect" — upstream took
  a different approach in MemPalace#757. Our broader inode+mtime detection
  and the mempalace_reconnect MCP tool remain as fork-local
  convenience; they're just not "ahead of upstream" anymore.

Statuses now populated:
- Linked PR number for the 7 changes with active upstream PRs
  (MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673 with APPROVED note, MemPalace#999, MemPalace#1000, MemPalace#1005).
- "PR pending" for 3 items that are good candidates but unfiled:
  epsilon mtime comparison, max_distance parameter, tool output
  mining.
- "fork-only" for 2 items we keep intentionally without pitching
  upstream: .blob_seq_ids_migrated marker (narrow), bulk_check_mined
  (complementary to upstream's MemPalace#784 file-locking).

Legend sentence added above the table explains the three status
values. 42 README-claim tests pass.
Copy link
Copy Markdown

@Dialectician Dialectician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the diff end to end, ran the full test suite on your main + this branch (998/998 pass in 40.6s on ChromaDB 1.5.8). Think this lands well. Two things to call out, neither a blocker.

Gate semantics on allow_fallback = (max_distance <= 0.0)

search_memories defaults max_distance = 0.0, so CLI callers (no override) get the BM25 fallback. But tool_search in mcp_server.py defaults to max_distance = 1.5 and always passes it through, so the MCP path hits 1.5 <= 0.0, False, fallback disabled by default. If an agent hits a drifted palace via the MCP tool, they get the silent underdeliver this PR is designed to prevent, because the gate conflates "permissive default" with "strict caller-set threshold."

Might be intentional if you are scoping to CLI for now. If not, cleaner gate is Optional[float] with None meaning "allow fallback" and any explicit value meaning "respect strict mode." Happy to follow up with a small PR if you confirm which way.

_count_in_scope on unfiltered scopes

When both wing and room are None, where is None and the function paginates all drawers to count. At 137K with PAGE=5000 that is ~28 round trips per search just for the scope count. drawers_col.count() does this O(1) when no filter is needed.

Also worth noting: the pre-mutation vector_hit_count capture (so the scope warning still fires after BM25 top-up) is load-bearing subtlety. Worth a comment if you have not already lost patience with refactor resistance.

Peace B.Sweet

Copy link
Copy Markdown

@Dialectician Dialectician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workload validation from a real palace (2487 drawers, 10 wings, active production use as memory backing for an agent stack). This palace has HNSW drift, every filtered-by-wing query hits the planner error. Exact reproduction of the scenario this PR addresses.

Three CLI queries against --wing <real-wing-id> with keywords "federation", "test", "memory":

Before your fork (mempalace 3.3.0, current installed runtime):

Search error: Search error: Error executing plan: Internal error: Error finding id

Hard SearchError. Caller sees nothing.

With your fork:

  • Warning: vector search unavailable: Error executing plan: Internal error: Error finding id
  • Scope count: Scope has: 22 drawers matching filter / 2280 drawers matching filter
  • BM25 top-up: vector search returned 0 of 3 requested; filled N from sqlite+BM25 keyword match
  • Actionable hint: Run mempalace repair to rebuild the HNSW index for full semantic recall.

Every contract promise in the PR description fires as advertised on real data. The vector_underdelivered pre-mutation capture also proved load-bearing: in one query the BM25 fallback filled 3 of 3 requested and the "2280 drawers in scope, vector ranked 0" warning still surfaced, which is the correct signal (vector is still the degraded layer even though the final hit count is fine).

Separately: the gate-semantics note from my earlier review (CLI default max_distance=0.0 enables BM25 fallback, MCP default 1.5 disables it) is more significant than I initially weighted it. The MCP path, which is the primary access pattern for agents querying this palace, would see "0 hits" silently on this same failure rather than "0 hits + warning + BM25 fill." For a drifted palace being queried by Claude through MCP, the agent would conclude the memory is empty when it is only the ranking layer that is broken. Happy to send the Optional[float] gate fix as a small follow-up PR if you greenlight.

Peace B.Sweet

jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28
  paginated round trips on 137K drawers when wing/room filter is absent)
- allow_fallback gate: fire BM25 fallback when max_distance is permissive
  (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the
  MCP path: tool_search passes max_distance=1.5, which the old gate read
  as "strict" and silently skipped fallback — so agents on a drifted palace
  saw 0 hits instead of BM25 coverage + a warning
- Add explanatory comment on vector_hit_count pre-mutation capture
- Update test mocks with explicit count.return_value (required now that
  _count_in_scope calls col.count() for unfiltered scopes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 19, 2026

Addressed both Dialectician review points:

Gate semantics: Replaced allow_fallback=(max_distance <= 0.0) with allow_fallback = (max_distance <= 0.0) or bool(warnings). The warnings list is non-empty only when a vector error occurred (the except Exception as e block in the query path). This means:

  • CLI callers (default max_distance=0.0): fallback fires as before
  • MCP tool_search (default max_distance=1.5) on a drifted palace: fallback now fires correctly
  • MCP tool_search on a working HNSW with strict filter that legitimately eliminates all results: no fallback (correct — vector isn't broken, the filter is just strict)

_count_in_scope performance: Unfiltered scopes (no wing/room filter, where is empty) now use Collection.count() which is O(1). Paginated get() still runs for filtered scopes since count() doesn't accept a where argument.

Also added the comment on vector_hit_count pre-mutation capture Dialectician called out as load-bearing subtlety.

@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 19, 2026

Thanks Dialectician — really appreciate the thorough review. Running the full suite + validating on a live drifted palace is exactly the kind of signal that's hard to fake, and the 'pre-mutation vector_hit_count capture is load-bearing' note saved a comment I'd been meaning to add.

Hoping the fixes hold up as well on your palace as they did in the test run.


On the three Copilot inline comments:

available_in_scope cap: _count_in_scope uses a paginated get(include=[]) loop (not a single capped get) so the count is authoritative for filtered scopes. Unfiltered scopes now use Collection.count() (O(1), no cap). Copilot was describing an earlier draft.

Run repair warning gate: Fixed by capturing vector_hit_count = len(hits) before BM25 may extend hits. The warning now gates on vector_underdelivered (pre-BM25) rather than final hit count, so it surfaces even when the fallback fills the request.

CLI missing palace path: The CLI search() now enriches the 'No palace found' error payload with the path before printing, so the output is 'No palace found at \<path\>' again.

@Dialectician
Copy link
Copy Markdown

Pulled b9a6585, ran the test suite (974 pass, 106 deselected), spot-checked the runtime against the repaired palace. The (max_distance <= 0.0) or bool(warnings) gate is a better solution than the Optional[float] shape I suggested. It captures the real invariant (vector is broken so fall back) rather than a proxy (caller did or did not explicitly set a strict threshold). In particular it handles the case my proposal would have gotten wrong: MCP strict caller with a working HNSW and a legitimately empty filter scope. No fallback in that case, correct.

count() short-circuit on unfiltered scopes looks clean too. Comment on the vector_hit_count capture helps.

Good trade. LGTM on this round.

Peace B.Sweet

jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28
  paginated round trips on 137K drawers when wing/room filter is absent)
- allow_fallback gate: fire BM25 fallback when max_distance is permissive
  (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the
  MCP path: tool_search passes max_distance=1.5, which the old gate read
  as "strict" and silently skipped fallback — so agents on a drifted palace
  saw 0 hits instead of BM25 coverage + a warning
- Add explanatory comment on vector_hit_count pre-mutation capture
- Update test mocks with explicit count.return_value (required now that
  _count_in_scope calls col.count() for unfiltered scopes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
…review addressed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Three corrections from @copilot-pull-request-reviewer:

1. available_in_scope was capped at n_results*20 via the pool get(),
   so the "sqlite-authoritative" framing was misleading on large
   palaces (would report at most 100 even when 5000+ drawers matched).
   Split the count out of the pool fetch: new _count_in_scope()
   paginates col.get(include=[], limit=5000, offset=K) to sum an
   authoritative total. Pagination keeps each query well under
   MemPalace#950's SQL-variable limit.

2. The "Run mempalace repair" warning was gated on
   len(hits) < n_results, so it never surfaced when the BM25 fallback
   filled the request — even though the vector index was the actual
   degraded layer. Added a vector_underdelivered flag captured before
   fallback runs; the warning now gates off that flag so users see
   "HNSW can't reach the rest" whenever the vector path fell short,
   regardless of how much BM25 papered over.

3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor
   to delegate through search_memories(). Preserve it explicitly in
   search() — the structured error payload stays path-agnostic, but
   the CLI adds "at <palace_path>" when printing so users still see
   which palace failed to open.

Also updated _sqlite_fallback_and_scope() signature to take
vector_underdelivered separately from allow_fallback, matching what
the two warnings now need to distinguish.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28
  paginated round trips on 137K drawers when wing/room filter is absent)
- allow_fallback gate: fire BM25 fallback when max_distance is permissive
  (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the
  MCP path: tool_search passes max_distance=1.5, which the old gate read
  as "strict" and silently skipped fallback — so agents on a drifted palace
  saw 0 hits instead of BM25 coverage + a warning
- Add explanatory comment on vector_hit_count pre-mutation capture
- Update test mocks with explicit count.return_value (required now that
  _count_in_scope calls col.count() for unfiltered scopes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jphein jphein force-pushed the feat/search-warnings-sqlite-fallback branch from b9a6585 to 9e45485 Compare April 19, 2026 01:05
@Dialectician
Copy link
Copy Markdown

Heads-up on 7c8bb59: one test regression from the count() short-circuit. Full run is 1 failed, 1024 passed.

FAILED tests/test_searcher.py::TestSearchCLI::test_search_handles_none_metadata_without_crash
E       TypeError: > not supported between instances of MagicMock and int
searcher.py:683

The test mocks mock_col but does not stub count(), so _count_in_scope returns a MagicMock, which then fails the available_in_scope > vector_hit_count compare.

Fix is one line, same pattern you applied to the other four tests in b9a6585:

mock_col = MagicMock()
mock_col.count.return_value = 2  # any int; test does not otherwise care
mock_col.query.return_value = {...}

(Runtime behavior is unaffected because real ChromaDB count() returns an int; the miss is mock-only.)

Peace B.Sweet

@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 22, 2026

Thanks @Dialectician — verified green. The regression at 7c8bb59 was the test-only mock_col.count wiring, fixed by c8bb47e. Current state on the PR head: 6 CI checks SUCCESS, mergeable CLEAN. Appreciate the catch.

@igorls igorls added enhancement New feature or request area/search Search and retrieval labels Apr 24, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
Three corrections from @copilot-pull-request-reviewer:

1. available_in_scope was capped at n_results*20 via the pool get(),
   so the "sqlite-authoritative" framing was misleading on large
   palaces (would report at most 100 even when 5000+ drawers matched).
   Split the count out of the pool fetch: new _count_in_scope()
   paginates col.get(include=[], limit=5000, offset=K) to sum an
   authoritative total. Pagination keeps each query well under
   MemPalace#950's SQL-variable limit.

2. The "Run mempalace repair" warning was gated on
   len(hits) < n_results, so it never surfaced when the BM25 fallback
   filled the request — even though the vector index was the actual
   degraded layer. Added a vector_underdelivered flag captured before
   fallback runs; the warning now gates off that flag so users see
   "HNSW can't reach the rest" whenever the vector path fell short,
   regardless of how much BM25 papered over.

3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor
   to delegate through search_memories(). Preserve it explicitly in
   search() — the structured error payload stays path-agnostic, but
   the CLI adds "at <palace_path>" when printing so users still see
   which palace failed to open.

Also updated _sqlite_fallback_and_scope() signature to take
vector_underdelivered separately from allow_fallback, matching what
the two warnings now need to distinguish.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28
  paginated round trips on 137K drawers when wing/room filter is absent)
- allow_fallback gate: fire BM25 fallback when max_distance is permissive
  (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the
  MCP path: tool_search passes max_distance=1.5, which the old gate read
  as "strict" and silently skipped fallback — so agents on a drifted palace
  saw 0 hits instead of BM25 coverage + a warning
- Add explanatory comment on vector_hit_count pre-mutation capture
- Update test mocks with explicit count.return_value (required now that
  _count_in_scope calls col.count() for unfiltered scopes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jphein jphein force-pushed the feat/search-warnings-sqlite-fallback branch from c8bb47e to a3e90a1 Compare April 24, 2026 21:25
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
Three corrections from @copilot-pull-request-reviewer:

1. available_in_scope was capped at n_results*20 via the pool get(),
   so the "sqlite-authoritative" framing was misleading on large
   palaces (would report at most 100 even when 5000+ drawers matched).
   Split the count out of the pool fetch: new _count_in_scope()
   paginates col.get(include=[], limit=5000, offset=K) to sum an
   authoritative total. Pagination keeps each query well under
   MemPalace#950's SQL-variable limit.

2. The "Run mempalace repair" warning was gated on
   len(hits) < n_results, so it never surfaced when the BM25 fallback
   filled the request — even though the vector index was the actual
   degraded layer. Added a vector_underdelivered flag captured before
   fallback runs; the warning now gates off that flag so users see
   "HNSW can't reach the rest" whenever the vector path fell short,
   regardless of how much BM25 papered over.

3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor
   to delegate through search_memories(). Preserve it explicitly in
   search() — the structured error payload stays path-agnostic, but
   the CLI adds "at <palace_path>" when printing so users still see
   which palace failed to open.

Also updated _sqlite_fallback_and_scope() signature to take
vector_underdelivered separately from allow_fallback, matching what
the two warnings now need to distinguish.
@jphein jphein force-pushed the feat/search-warnings-sqlite-fallback branch from a3e90a1 to eb69eb7 Compare April 25, 2026 14:39
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28
  paginated round trips on 137K drawers when wing/room filter is absent)
- allow_fallback gate: fire BM25 fallback when max_distance is permissive
  (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the
  MCP path: tool_search passes max_distance=1.5, which the old gate read
  as "strict" and silently skipped fallback — so agents on a drifted palace
  saw 0 hits instead of BM25 coverage + a warning
- Add explanatory comment on vector_hit_count pre-mutation capture
- Update test mocks with explicit count.return_value (required now that
  _count_in_scope calls col.count() for unfiltered scopes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
Two new scripts plus legitimate fixes they surfaced.

scripts/preflight.sh — runs the same checks CI does (ruff check, ruff
format --check, pytest). Prevents the class of "I ran ruff format locally
but CI runs ruff check too" bugs we hit on PR MemPalace#1024 and PR MemPalace#1198 today.

scripts/rebase-on-develop.sh — rebases a fork PR branch onto
upstream/develop, then auto-restores fork-only collateral
(.claude-plugin/hooks/*.sh, .claude-plugin/.mcp.json) to upstream's
state. The collateral cleanup was the recurring step I had to remember
manually for each of MemPalace#1005/MemPalace#1024/MemPalace#1177; this codifies it. Supports
--finish for the "after I resolved conflicts manually" continuation
path. Never auto-pushes — prints the push command for confirmation.

Lint debt the new preflight caught on main:

- tests/test_palace_graph.py: F811 — `invalidate_graph_cache` was
  imported at line 9, then re-imported inside the chromadb-mock
  patch.dict block at line 44. Removed the duplicate; the line-9
  import is sufficient since the autouse fixture at line 12 already
  uses it.
- mempalace/backends/chroma.py: ruff format wanted a one-line
  `collection.modify(configuration=...)` call instead of the wrapped
  multi-line form.
- mempalace/hooks_cli.py: ruff format wanted blank line after import
  in the daemon-URL try block, and dict-literal style for the
  json.dumps call.

CI on jphein/mempalace runs both ruff check and ruff format --check;
without these fixes the next push to main would have shown red.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented May 3, 2026

Posting an update now that #1306 (candidate_strategy="union") has merged into develop.

The two PRs address overlapping but distinct failure modes, and they aren't strictly redundant:

#1306 (candidate_strategy="union", merged) #1005 (this PR)
Mechanism Pre-rerank pool widening: vector ∪ BM25 candidates feed _hybrid_rank together Post-hoc top-up: vector path runs first, BM25 promotes only if len(hits) < n_results
Failure mode Query embeds close to the wrong content semantically (terminology guides, narrative-shaped queries) HNSW sparse / drifted / #951 filter-planner failure / vector path raised
Default behavior Opt-in (candidate_strategy="vector" is unchanged) Auto (warnings always populated; top-up triggers when underdelivered)
warnings: [...] field No Yes — explicit "vector search unavailable: ..." surface
available_in_scope: N No Yes — sqlite-authoritative scope count
Vector exception handling Raises Caught → warning, returns BM25-only

The observability slice (warnings + available_in_scope + no-raise on vector errors) is novel and not covered by #1306. The candidate-pool-widening slice overlaps semantically.

Three paths I'd be happy with — your call:

  1. Reframe + rebase. Keep the observability slice and the post-hoc top-up; cite feat(searcher): candidate_strategy="union" — BM25 candidates joined with vector pool before hybrid rerank #1306 as orthogonal. The post-hoc top-up still earns its keep because feat(searcher): candidate_strategy="union" — BM25 candidates joined with vector pool before hybrid rerank #1306's union mode is opt-in and doesn't help callers who didn't pass the flag (or who hit infra-shaped failures rather than query-shape mismatches).
  2. Slim down + rebase. Drop the BM25 top-up entirely; keep just the warnings + available_in_scope + no-raise. Smaller, narrower, easier to review.
  3. Close as superseded. If you'd rather have one knob (candidate_strategy="union") than two coexisting fallback shapes, I'll close.

The current branch is several weeks pre-#1306, so any rebase will be a clean re-apply on top of the new searcher.py — no merge churn for you to read through. Happy to do whichever you prefer.

jphein added a commit to jphein/mempalace that referenced this pull request May 6, 2026
Three corrections from @copilot-pull-request-reviewer:

1. available_in_scope was capped at n_results*20 via the pool get(),
   so the "sqlite-authoritative" framing was misleading on large
   palaces (would report at most 100 even when 5000+ drawers matched).
   Split the count out of the pool fetch: new _count_in_scope()
   paginates col.get(include=[], limit=5000, offset=K) to sum an
   authoritative total. Pagination keeps each query well under
   MemPalace#950's SQL-variable limit.

2. The "Run mempalace repair" warning was gated on
   len(hits) < n_results, so it never surfaced when the BM25 fallback
   filled the request — even though the vector index was the actual
   degraded layer. Added a vector_underdelivered flag captured before
   fallback runs; the warning now gates off that flag so users see
   "HNSW can't reach the rest" whenever the vector path fell short,
   regardless of how much BM25 papered over.

3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor
   to delegate through search_memories(). Preserve it explicitly in
   search() — the structured error payload stays path-agnostic, but
   the CLI adds "at <palace_path>" when printing so users still see
   which palace failed to open.

Also updated _sqlite_fallback_and_scope() signature to take
vector_underdelivered separately from allow_fallback, matching what
the two warnings now need to distinguish.
jphein added a commit to jphein/mempalace that referenced this pull request May 6, 2026
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28
  paginated round trips on 137K drawers when wing/room filter is absent)
- allow_fallback gate: fire BM25 fallback when max_distance is permissive
  (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the
  MCP path: tool_search passes max_distance=1.5, which the old gate read
  as "strict" and silently skipped fallback — so agents on a drifted palace
  saw 0 hits instead of BM25 coverage + a warning
- Add explanatory comment on vector_hit_count pre-mutation capture
- Update test mocks with explicit count.return_value (required now that
  _count_in_scope calls col.count() for unfiltered scopes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jphein jphein force-pushed the feat/search-warnings-sqlite-fallback branch from eb69eb7 to 03ecefd Compare May 6, 2026 09:39
When the vector index returns fewer than n_results (sparse HNSW
post-repair, MemPalace#951 filter-planner failure, drift), search_memories now:

1. Computes an authoritative scope count via paginated col.get(),
   surfaced as `available_in_scope` in the response. Caps each query
   below MemPalace#950's SQL-variable limit.

2. Tops up the hits list with BM25-ranked sqlite candidates tagged
   `matched_via: "sqlite_bm25_fallback"` when the vector path is
   under-delivering. Skips candidates with BM25 score 0 so the
   fallback never pads with unrelated content.

3. Returns `warnings: [...]` describing when fallback fired and when
   the scope contains more drawers than the vector path can rank
   (gated on a `vector_underdelivered` flag captured before fallback
   runs, so the warning surfaces even when BM25 papered over the gap).

CLI search() delegates to search_memories() so terminal output and
MCP responses share the same retrieval, fallback, and warning
semantics. Preserves the palace path in printed errors.

Closes the silent 0-hit failure mode where data was in sqlite but
the vector path returned nothing — visible to the user via warnings
and `available_in_scope`, fixable via `mempalace repair`.

Tests: 29/29 pass on rebased branch (Python 3.9 floor honored via
Optional[int]). Mock setup updated to set count.return_value so the
new "more in scope" warning path doesn't fail on MagicMock comparison.

Squashed rebase against current upstream/develop (post-MemPalace#1377). Was
filed as 5-commit history; squashed for cleaner review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jphein jphein force-pushed the feat/search-warnings-sqlite-fallback branch from 03ecefd to d0f4e32 Compare May 6, 2026 09:46
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented May 6, 2026

Rebased against current develop (post-#1377/#1029/#1019/#988) and squashed the 5-commit history into one focused commit (d0f4e32). Preserves Dialectician's vector_underdelivered fix, Copilot's _count_in_scope paginated authoritative count, and the Optional[int] Python 3.9 compat. Mock setup in test_effective_distance_clamped_to_valid_cosine_range (added by #1029 while this PR was waiting) needed mock_col.count.return_value = 2 so the new "more in scope" warning path doesn't compare a MagicMock to an int; that change matches the same pattern @copilot-pull-request-reviewer originally asked for.

Local: 1555 passed, 0 failed (Python 3.12, full suite). CI status will populate shortly.

Mergeable now. Ready when you are.

…noqa

The squash in d0f4e32 pushed search_memories one branch over ruff's
C901 threshold (26 > 25). The function is doing meaningful work —
dispatch across vector / BM25-only / candidate-strategy / sqlite-
fallback / warning paths, each of which is one branch — so an
extraction here would either thrash the diff or re-run the A/B
benchmarks that gated the original split decisions. Adding a noqa
with intent annotated is the smaller, lower-risk fix; an extraction
can be a follow-up if maintainers prefer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/search Search and retrieval enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants