feat(i18n): add entity detection to German, Spanish, and French locales#1001
Merged
igorls merged 4 commits intoMemPalace:developfrom Apr 21, 2026
Merged
Conversation
This was referenced Apr 21, 2026
This was referenced Apr 23, 2026
jphein
pushed a commit
to jphein/mempalace
that referenced
this pull request
Apr 24, 2026
Restore-integrity release. Unbreaks fresh `pip install mempalace` from v3.3.2 by re-tagging current develop, which carries both the plugin.json consumer (shipped in 3.3.2) and the matching mempalace-mcp entry point in pyproject.toml (added on develop ~10h after the 3.3.2 tag via MemPalace#340 by @messelink). MemPalace#1093 diagnosed by @jphein. Bumps (all 5 sources agree per Version Guard / CLAUDE.md): - mempalace/version.py 3.3.2 → 3.3.3 - pyproject.toml 3.3.2 → 3.3.3 - .claude-plugin/plugin.json 3.3.2 → 3.3.3 - .claude-plugin/marketplace.json 3.3.2 → 3.3.3 - .codex-plugin/plugin.json 3.3.2 → 3.3.3 - CHANGELOG.md new [3.3.3] entry No code changes. The fix for MemPalace#1093 is already on develop via merged PRs MemPalace#340, MemPalace#1021, MemPalace#851, MemPalace#942, MemPalace#833, MemPalace#673, MemPalace#661, MemPalace#659, MemPalace#1097, MemPalace#1051, MemPalace#1001, MemPalace#945. Branch name intentionally outside the `release/*` ruleset so follow-up CI-fix commits aren't gated behind a nested PR. (Supersedes MemPalace#1143 — closed for exactly that reason after it missed 3 of 5 version files.) Smoke-tested locally from a fresh develop clone: grep mempalace-mcp pyproject.toml .claude-plugin/plugin.json # both ✓ python -m build --wheel # ✓ pip install …-py3-none-any.whl # ✓ which mempalace-mcp # ✓ mempalace-mcp --help # ✓
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds entity detection to the
de,es, andfrlocale JSONs. These three languages shipped in the original 8-language release (baf3c0a, PR #718) withterms/cli/aaak/regexsections but noentitysection. Entity detection was moved into per-locale JSON by PR #911, but that refactor only retrofitteden.json. Subsequent locales (ru,it,pt-br,hi,id) arrived with dedicated entity sections, andukis in-flight in #994; the original non-English 7 stayed entity-less.For a palace configured with
MEMPALACE_LANG=de/es/fr,get_entity_patterns((lang,))atmempalace/i18n/__init__.py:284hitsnot found_anyand silently falls back to English regex patterns. After this PR, those locales get locale-specific candidate patterns, verb patterns, pronouns, dialogue markers, direct-address alternations, project-verb patterns, and stopwords.Scope
In scope: de, es, fr.
Out of scope:
\b-based regex entity detection does not work for CJK scripts without whitespace; a real segmenter (MeCab / KoNLPy) is needed. Deferred.Changes
mempalace/i18n/de.jsonmempalace/i18n/es.jsonmempalace/i18n/fr.jsontests/test_i18n.pySchema follows the loader contract in
mempalace/i18n/__init__.py:191-223, parity with existingen.json/ru.json/it.json/pt-br.json/hi.json/id.json:candidate_pattern(str),multi_word_pattern(str)person_verb_patterns,pronoun_patterns,dialogue_patterns,project_verb_patterns(lists)direct_address_pattern: singular string with|-alternation per loader at:209-210(the trap that caught feat(i18n): add Ukrainian language support #994)stopwords(list)Language-specific notes
candidate_pattern[A-ZÄÖÜ][a-zäöüß]{1,19}matches every noun in running text. Stopword coverage is more aggressive than English (days of week, months, common nouns) to keep downstream filtering tractable.passé composé(a dit,a demandé), so mostperson_verb_patternscover the auxiliary + participle form. Apostrophe-elided articles (l'architecture) are handled literally in a few patterns; native speakers may find contractions that slip through.dijo,preguntó,decidió) plus present (piensa,quiere). Reflexive verb endings are covered at a basic level; native review would catch missed inflection forms.New schema-invariant test
test_direct_address_key_is_singular_string_for_all_localesasserts that any locale declaring direct-address uses the singulardirect_address_pattern(str) key, never the pluraldirect_address_patterns(list). The plural name is the output shape of the merged dict, not the input shape of locale files; declaring plural silently drops every direct-address pattern in that locale. This test would have caught #994 pre-review and guards every future locale.Verification
python -m pytest tests/test_i18n.py tests/test_i18n_lang_case.py -v: 18/18 pass (10 existing i18n smoke + 4 new entity + 4 case-insensitivity)python -m pytest tests/ --ignore=tests/benchmarks: 565 pass, 1 pre-existing env-flake unrelated to i18n (tests/test_mcp_stdio_protection.py::test_module_import_redirects_stdout_to_stderrfails on cleanupstream/developtoo, due to a missing transitivehttpxdependency in my local env)ruff check .clean,ruff format --check .cleanget_entity_patterns(("de","es","fr"))returns non-empty pattern lists per locale (de: 15 verbs / 9 pronouns / 197 stopwords; es: 15/9/190; fr: 15/9/180)hallo Peter,hola María,bonjour Pierre, etc.) match; non-address text (Pierre est arrivé) correctly rejectsDisclosure
The German, Spanish, and French regex patterns and stopword lists in this PR were drafted by Claude with editorial review on my end for structural consistency against existing
en.json/it.json/pt-br.json/ru.jsonlocales. I don't claim native speaker expertise in any of these languages, so translations should be scrutinized before merge, particularly:candidate_patternmatches every noun, producing high candidate noise. Stopword coverage matters more here than in other locales.j'ai,l',d') in verb patterns is brittle; contractions may breakperson_verb_patterns.Maintainers or native speakers, if you spot errors, please push corrections directly or flag lines in a review.
Test plan
ruff check+ruff format --checkclean