Skip to content

Fix HybridRetriever cutoff and KeyError with small collections#49

Open
tavian-dev wants to merge 1 commit intoAmenRa:mainfrom
tavian-dev:fix/hybrid-cutoff
Open

Fix HybridRetriever cutoff and KeyError with small collections#49
tavian-dev wants to merge 1 commit intoAmenRa:mainfrom
tavian-dev:fix/hybrid-cutoff

Conversation

@tavian-dev
Copy link
Copy Markdown

Summary

Fixes two related issues in HybridRetriever:

Issue #29KeyError: -1 when collection has fewer than 1000 documents:

  • search() and msearch() passed a hardcoded cutoff of 1000 to sub-retrievers
  • When the dense retriever (faiss) is asked for more results than exist, it returns -1 as a placeholder
  • map_internal_ids_to_original_ids then fails with KeyError: -1
  • Fix: filter out -1 entries in map_internal_ids_to_original_ids as a safety net

Issue #33 — cutoff not passed to sub-retrievers:

  • The user's cutoff parameter was only applied after fusion, not to the sub-retriever calls
  • Sub-retrievers always fetched 1000 results regardless of what the user requested
  • Fix: use max(cutoff, 1000) so sub-retrievers respect large cutoffs while still fetching enough candidates for fusion quality

Changes

  • base_retriever.py: map_internal_ids_to_original_ids now skips -1 entries
  • hybrid_retriever.py: search() and msearch() use max(cutoff, 1000) for sub-retrievers

Fixes #29, fixes #33

Two fixes:

1. Sub-retrievers in search() and msearch() used a hardcoded cutoff of
   1000. When the collection has fewer documents, the dense retriever
   (faiss) returns -1 for missing entries, causing KeyError in
   map_internal_ids_to_original_ids. Now uses max(cutoff, 1000) so the
   sub-cutoff respects the user's requested cutoff while still fetching
   enough candidates for fusion.

2. map_internal_ids_to_original_ids now filters out -1 entries as a
   safety net, since faiss can return -1 when k exceeds the index size.

Fixes AmenRa#29, fixes AmenRa#33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant