Skip to content

Hash embedding mode should use lexical/BM25 search fallback #1380

@vanachterjacob

Description

@vanachterjacob

Problem

When embedding_device=hash / lexical fallback is configured, search can return low-signal vector hits instead of obvious lexical matches.

Observed query:

mempalace reindex database hnsw embeddings

The MCP/vector path returned unrelated entries such as stopword/i18n files and NUL-byte-like documents, while SQLite FTS/BM25 found relevant HNSW/reindex rows immediately.

Root Cause

Hash embeddings are a local lexical fallback, but the search path still treats them like dense semantic embeddings. That makes vector distance a weak ranking signal for some real queries. The SQLite FTS fallback also selected candidates without ORDER BY rank, so low-rowid partial matches could crowd out better candidates before BM25 reranking.

Expected Behavior

When hash/lexical embedding mode is active, search should use the SQLite BM25 path intentionally. FTS candidate selection should retrieve relevant candidates first before local BM25 reranking.

Proposed Fix

  • Route MCP and CLI search to SQLite BM25 when embedding_device=hash or lexical
  • Add ORDER BY rank to the SQLite FTS candidate query
  • Preserve the existing vector path for normal ONNX/dense embedding mode
  • Surface the fallback reason in structured search results

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/mcpMCP server and toolsarea/searchSearch and retrievalbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions