Commit 30138b5
committed
fix(searcher): align i18n stop_words with tokenizer and fix lang docstring
_tokenize() emits only \w{2,} tokens, so single-character entries in
ja.json and zh-CN.json regex.stop_words could never match. Removed
single-char kana from ja and single-char hanzi from zh-CN; remaining
≥2-char entries are what _tokenize actually produces.
Also updated the search_memories(lang=) docstring to reflect the opt-in
lang_explicit resolution implemented in fb1a133; prior text described
the pre-opt-in MempalaceConfig().lang fallback chain.
Reported by qodo-ai-reviewer on MemPalace#977.1 parent 9a19af4 commit 30138b5
3 files changed
Lines changed: 8 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
822 | 822 | | |
823 | 823 | | |
824 | 824 | | |
825 | | - | |
826 | | - | |
827 | | - | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
828 | 831 | | |
829 | 832 | | |
830 | 833 | | |
| |||
0 commit comments