fix: clamp similarity scores to [0,1] to prevent negative values by bobo-xxx · Pull Request #988 · MemPalace/mempalace

bobo-xxx · 2026-04-18T04:41:23Z

Fix: Clamp similarity scores to [0,1]

Both \ and \ were returning negative similarity scores
for very dissimilar content.

Root Cause

Both used \ to convert cosine distance to similarity.
With , ChromaDB distances can exceed 1.0 for maximally
dissimilar vectors, making \ negative.

The rest of the codebase already handles this correctly:
\ line 285 uses .

Changes

mempalace/mcp_server.py: - mempalace/layers.py:
Fixes tool_check_duplicate in mcp_server.py and Layer3.search() in layers.py can return negative similarity scores for very dissimilar content. #978.

Copilot

Pull request overview

This PR fixes negative similarity scores produced when converting ChromaDB cosine distances to similarity, ensuring downstream clients don’t see meaningless negative values (per Issue #978).

Changes:

Clamp tool_check_duplicate() similarity to floor at 0.0 when dist > 1.
Clamp Layer3.search() similarity to floor at 0.0 when dist > 1.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
mempalace/mcp_server.py	Floors duplicate-check similarity at `0.0` to prevent negative values.
mempalace/layers.py	Floors Layer3 formatted search similarity at `0.0` to prevent negative values.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T04:43:28Z

                dist = results["distances"][0][i]
-                similarity = round(1 - dist, 3)
+                similarity = round(max(0.0, 1 - dist), 3)
                if similarity >= threshold:


This change modifies the JSON similarity value returned by tool_check_duplicate() when dist > 1 (it now floors at 0.0). Since tests/test_mcp_server.py already covers tool_check_duplicate(), please add a regression test that mocks/forces a distance slightly above 1.0 and asserts the returned similarity is clamped to 0.0 (not negative).

Copilot · 2026-04-18T04:43:28Z

        for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
-            similarity = round(1 - dist, 3)
+            similarity = round(max(0.0, 1 - dist), 3)
            wing_name = meta.get("wing", "?")


Layer3.search() now clamps similarity to avoid negative values, but Layer3.search_raw() below still computes round(1 - dist, 3) (see the similarity field in the raw hit dict). That means negative similarity scores can still leak to API consumers via search_raw. Consider applying the same max(0.0, 1 - dist) clamp there as well for consistency and to fully address the underlying issue.

Copilot · 2026-04-18T04:43:28Z

        for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists), 1):
-            similarity = round(1 - dist, 3)
+            similarity = round(max(0.0, 1 - dist), 3)
            wing_name = meta.get("wing", "?")


This change alters similarity semantics when dist > 1 (similarity now floors at 0.0). There are existing tests for Layer3.search in tests/test_layers.py; please add a regression test case with a mocked distance slightly > 1.0 to assert the rendered sim= value is 0.0 (and not negative).

mvalentsev · 2026-04-18T10:23:11Z

#979 (2026-04-17) covers the same fix: max(0.0, 1 - dist) clamp in mcp_server.tool_check_duplicate and layers.Layer3.search. It is authored by shafdev, the reporter of #978, and includes regression tests test_check_duplicate_similarity_never_negative (tests/test_mcp_server.py) and test_layer3_search_similarity_never_negative (tests/test_layers.py).

#979 does bundle unrelated scope (version bumps in version.py, pyproject.toml, CHANGELOG.md, .claude-plugin/*, uv.lock) that will likely need to be trimmed. This PR is cleaner on scope but ships without tests.

jphein · 2026-04-19T03:20:55Z

Layer3.search_raw() at line 333 of layers.py has the same round(1 - dist, 3) pattern and isn't covered here. #979 fixes all three sites (including search_raw()) and includes tests — it just needs its version-bump scope stripped before it's ready to merge.

``search_memories`` computes ``effective_dist = dist - boost`` where ``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a rank-0 closet hit. When the raw drawer distance is small — any near-exact match — the subtraction goes negative. Two downstream effects: 1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as ``similarity``. With ``effective_dist = -0.30`` that yields ``similarity = 1.30``, outside the documented ``[0, 1]`` range. The ``max(0.0, ...)`` only prevents negative similarities; it does not cap above 1. 2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts ``scored`` ascending by that key. A negative key drops *below* the rest, so the strongest hybrid matches end up sorting after weaker ones — ranking inversion under the exact conditions hybrid retrieval is supposed to serve best. Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``. The boost still wins (closet-backed hit still ranks first), it just no longer flips the order. Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) + closet_col (rank-0 closet for the 0.08 source) → assert all hits have ``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that the closet-boosted source still ranks first. Relationship to other PRs: * **MemPalace#988** clamps the output ``similarity`` alone. That does not fix the sort-key inversion or the invalid ``effective_distance`` in the returned dict. This PR clamps at the arithmetic source so both downstream users of the value stay in range. * Orthogonal to **MemPalace#979** (``tool_check_duplicate`` negative similarity).

igorls · 2026-05-03T01:51:08Z

Bumping — verified REAL on develop: mempalace/mcp_server.py:693 (tool_check_duplicate) and mempalace/layers.py:286,345 (Layer3.search) all still use round(1 - dist, 3) without clamping. Issue pulled into v3.3.5 via #978.

Note: there's a competing PR #979 covering the same ground but at +147/-11 across 12 files vs your +2/-2 surgical fix. We're going to close #979 as duplicate and prefer this one. Please confirm CI green on the latest commit and we can land it.

igorls · 2026-05-06T04:56:54Z

Rebased onto develop 67cda9d. The conflict in mempalace/layers.py was a benign overlap with #1030 (None guards) — combined both: max(0.0, 1 - dist) clamp + meta = meta or {} / doc = doc or "" guards. Affected tests pass locally (test_layers.py + test_mcp_server.py: 130 passed).

Note: the lint check on develop is pre-existing-failing from #1305 (E402 in tests/test_hooks_cli.py:972). It will surface here too until develop is patched. Will open a separate small PR for that.

``search_memories`` computes ``effective_dist = dist - boost`` where ``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a rank-0 closet hit. When the raw drawer distance is small — any near-exact match — the subtraction goes negative. Two downstream effects: 1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as ``similarity``. With ``effective_dist = -0.30`` that yields ``similarity = 1.30``, outside the documented ``[0, 1]`` range. The ``max(0.0, ...)`` only prevents negative similarities; it does not cap above 1. 2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts ``scored`` ascending by that key. A negative key drops *below* the rest, so the strongest hybrid matches end up sorting after weaker ones — ranking inversion under the exact conditions hybrid retrieval is supposed to serve best. Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``. The boost still wins (closet-backed hit still ranks first), it just no longer flips the order. Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) + closet_col (rank-0 closet for the 0.08 source) → assert all hits have ``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that the closet-boosted source still ranks first. Relationship to other PRs: * **MemPalace#988** clamps the output ``similarity`` alone. That does not fix the sort-key inversion or the invalid ``effective_distance`` in the returned dict. This PR clamps at the arithmetic source so both downstream users of the value stay in range. * Orthogonal to **MemPalace#979** (``tool_check_duplicate`` negative similarity).

bobo-xxx requested review from bensig and milla-jovovich as code owners April 18, 2026 04:41

Copilot AI review requested due to automatic review settings April 18, 2026 04:41

Copilot started reviewing on behalf of bobo-xxx April 18, 2026 04:41 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

eldar702 mentioned this pull request Apr 19, 2026

fix(searcher): clamp effective_distance to valid cosine range [0, 2] #1029

Merged

jphein added bug Something isn't working area/search Search and retrieval labels Apr 21, 2026

igorls added this to the v3.3.5 milestone May 2, 2026

igorls mentioned this pull request May 3, 2026

Fix/check duplicate negative similarity #979

Closed

3 tasks

igorls force-pushed the clawoss/fix/978-negative-similarity branch from fbe287d to ad68ff6 Compare May 6, 2026 04:56

igorls mentioned this pull request May 6, 2026

fix(lint): hoist hooks_cli_mod import to top of test_hooks_cli (E402) #1375

Merged

3 tasks

bobo-xxx added 2 commits May 6, 2026 02:20

fix(mcp_server): clamp similarity to [0,1] to avoid negative values

eef053d

fix(layers): clamp similarity to [0,1] to avoid negative values

f2bed92

igorls force-pushed the clawoss/fix/978-negative-similarity branch from ad68ff6 to f2bed92 Compare May 6, 2026 05:20

igorls merged commit 8a9b2be into MemPalace:develop May 6, 2026
6 checks passed

jphein mentioned this pull request May 6, 2026

feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers #1005

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clamp similarity scores to [0,1] to prevent negative values#988

fix: clamp similarity scores to [0,1] to prevent negative values#988
igorls merged 2 commits intoMemPalace:developfrom
bobo-xxx:clawoss/fix/978-negative-similarity

bobo-xxx commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

mvalentsev commented Apr 18, 2026

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

igorls commented May 3, 2026

Uh oh!

igorls commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

bobo-xxx commented Apr 18, 2026