Commit db333f2
committed
fix(searcher): wire i18n stop_words through BM25 fallback / union path
scoring site inside _bm25_only_via_sqlite that this PR did not cover --
the original wiring landed before MemPalace#1306 existed. Without the plumbing,
two paths silently bypass locale stop_words even when the palace has
MEMPALACE_LANG set:
- vector_disabled=True (MemPalace#1222 fallback): BM25-only scoring runs without
filtering.
- candidate_strategy="union": BM25 candidates merged into the rerank
pool come from a tokenizer that ignored the configured locale, so
the merged-in entries fight the lang-aware _hybrid_rank rerank.
Resolution moves once: stop_words = _resolve_stop_words(lang) is
hoisted before the vector_disabled branch and threaded through
_apply_candidate_strategy, _merge_bm25_union_candidates, and
_bm25_only_via_sqlite into _bm25_scores.
The FTS5 candidate-selection _tokenize at the top of
_bm25_only_via_sqlite is left untouched -- chromadb's FTS5 index
is built with a trigram tokenizer that already mismatches our
\\w{2,} regex, so dropping further tokens there changes selection
semantics in ways outside this PR's scope.
Three tests pin the propagation: a sqlite-backed test for
_bm25_only_via_sqlite -> _bm25_scores, a unit spy on
_apply_candidate_strategy -> merger, and an end-to-end on
search_memories(vector_disabled=True) -> _bm25_only_via_sqlite.1 parent 30138b5 commit db333f2
2 files changed
Lines changed: 127 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
433 | 433 | | |
434 | 434 | | |
435 | 435 | | |
| 436 | + | |
436 | 437 | | |
437 | 438 | | |
438 | 439 | | |
| |||
637 | 638 | | |
638 | 639 | | |
639 | 640 | | |
640 | | - | |
| 641 | + | |
641 | 642 | | |
642 | 643 | | |
643 | 644 | | |
| |||
671 | 672 | | |
672 | 673 | | |
673 | 674 | | |
| 675 | + | |
674 | 676 | | |
675 | 677 | | |
676 | 678 | | |
| |||
705 | 707 | | |
706 | 708 | | |
707 | 709 | | |
| 710 | + | |
708 | 711 | | |
709 | 712 | | |
710 | 713 | | |
| |||
763 | 766 | | |
764 | 767 | | |
765 | 768 | | |
| 769 | + | |
766 | 770 | | |
767 | 771 | | |
768 | 772 | | |
| |||
771 | 775 | | |
772 | 776 | | |
773 | 777 | | |
774 | | - | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
775 | 788 | | |
776 | 789 | | |
777 | 790 | | |
| |||
834 | 847 | | |
835 | 848 | | |
836 | 849 | | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
837 | 855 | | |
838 | 856 | | |
839 | 857 | | |
| |||
842 | 860 | | |
843 | 861 | | |
844 | 862 | | |
| 863 | + | |
845 | 864 | | |
846 | 865 | | |
847 | | - | |
848 | | - | |
849 | 866 | | |
850 | 867 | | |
851 | 868 | | |
| |||
1036 | 1053 | | |
1037 | 1054 | | |
1038 | 1055 | | |
| 1056 | + | |
1039 | 1057 | | |
1040 | 1058 | | |
1041 | 1059 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
522 | 523 | | |
523 | 524 | | |
524 | 525 | | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
0 commit comments