feat(i18n): add Hebrew language support by shaibachar · Pull Request #1031 · MemPalace/mempalace

shaibachar · 2026-04-19T11:52:51Z

What does this PR do?

Adds Hebrew language support to the i18n/entity-detection layer.

Adds a new Hebrew locale file at /abs/path/mempalace/i18n/he.json
Extends i18n coverage tests to include Hebrew sample text in /abs/path/mempalace/tests/test_i18n.py
Adds Hebrew-specific entity detection tests in /abs/path/mempalace/tests/test_entity_detector.py

How to test

Run:

python -m pytest tests/test_i18n.py tests/test_entity_detector.py -v
python -m pytest tests/ -v

Expected:

Hebrew locale loads successfully
Hebrew sample compression test passes
Hebrew entity candidate extraction and person-verb scoring tests pass

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

release: sync develop → main (v3.3.0 manifest, SECURITY.md, version guard, Pages CNAME)

Bumps version across pyproject.toml, mempalace/version.py, README badge, and uv.lock. Finalizes the 3.3.0 CHANGELOG section (was still labeled 'Unreleased') and adds a 3.3.1 section covering the multi-language entity-detection infra and the five new locales landed since 2026-04-13. Highlights: - Multi-language entity detection infra (MemPalace#911) + script-aware word boundaries for combining-mark scripts (MemPalace#932) + BCP 47 case-insensitive locale resolution (MemPalace#928) + i18n patterns wired into miner/palace/ entity_registry (MemPalace#931) - Five new fully-supported locales: pt-br (MemPalace#156), ru (MemPalace#760), it (MemPalace#907), hi (MemPalace#773), id (MemPalace#778) - UTF-8 encoding fix on read_text() calls for non-UTF-8 Windows locales (MemPalace#946) - KnowledgeGraph lock correctness (MemPalace#884, MemPalace#887) - Various smaller fixes and improvements

Advisor caught: initial boundary (962776c..develop) skipped PRs that landed on develop after v3.3.0 tag but before the sync-back merge. Adds entries for MemPalace#871 MEMPAL_VERBOSE, MemPalace#811 research() local-only default, MemPalace#866 init .gitignore, MemPalace#864 MCP stdout redirect, MemPalace#863 precompact hook, MemPalace#865 searcher empty results, MemPalace#831 cold-start palace, MemPalace#862 init help, MemPalace#815 Slack provenance, MemPalace#840 save hook auto-mine. Also drops the awkward caveat on MemPalace#846 created_at — it's post-v3.3.0.

version-guard workflow checks five sources must agree: mempalace/version.py, pyproject.toml, .claude-plugin/marketplace.json, .claude-plugin/plugin.json, .codex-plugin/plugin.json. Initial release commit missed the three plugin manifests.

…gin-manifests release: bump plugin manifests to 3.3.1

release: v3.3.1

mvalentsev · 2026-04-19T12:21:35Z

A couple of things I noticed while reading the diff:

Scope: this PR is bundling a release bump with the locale addition. The Hebrew scope is he.json plus the two test files, but the diff also touches pyproject.toml, mempalace/version.py, uv.lock, the README.md badge, all three plugin manifests (.claude-plugin/marketplace.json, .claude-plugin/plugin.json, .codex-plugin/plugin.json), and adds a full [3.3.1] CHANGELOG block that attributes 20+ other merged PRs (#156, #760, #907, #773, #778, #911, #932, #928, #931, #946, #758, #876, ...). Those version bumps and release notes are usually maintainer work, and keeping them here will conflict with whatever 3.3.1 looks like when it is actually cut. Reverting the non-Hebrew files would leave a cleaner PR to review.

regex.stop_words catches domain nouns: the current list filters ארמון (palace), אגף (wing), ארון (closet), מגירה (drawer), plus generic terms like קובץ, קוד, בדיקה, פרויקט, עבודה. BM25 will strip those from Hebrew queries and documents, so searches that use the palace vocabulary or common project nouns will miss their natural matches. Tightening the list to function words (prepositions, pronouns, copulas) matches what the other locales ship.

Duplicate את entry: "את" appears twice in the entity.stopwords array (as the 2nd item and again as the 22nd). The same token also repeats inside the space-separated regex.stop_words string. Stopword lookup dedupes on match, so the repeats are harmless but likely unintended.

boundary_chars: Hebrew has the same \b problem as Devanagari and Arabic that #932 introduced boundary_chars for. Without it, \b{name}\b person-verb patterns will not fire reliably when Hebrew names adjoin Hebrew text (ן, ם, י endings in particular). Adding a boundary_chars field with the Hebrew block would let the loader expand \b correctly.

shaibachar · 2026-04-19T15:30:40Z

i will do cleanup and apply comments

igorls and others added 7 commits April 14, 2026 12:35

Merge pull request MemPalace#878 from MemPalace/develop

73f038c

release: sync develop → main (v3.3.0 manifest, SECURITY.md, version guard, Pages CNAME)

release: bump plugin manifests to 3.3.1

05ad2dc

version-guard workflow checks five sources must agree: mempalace/version.py, pyproject.toml, .claude-plugin/marketplace.json, .claude-plugin/plugin.json, .codex-plugin/plugin.json. Initial release commit missed the three plugin manifests.

Merge pull request MemPalace#958 from MemPalace/fix/release-3.3.1-plu…

b552bcf

…gin-manifests release: bump plugin manifests to 3.3.1

Merge pull request MemPalace#957 from MemPalace/release/3.3.1

6889c6f

release: v3.3.1

feat(i18n): add Hebrew language support

dca5946

shaibachar requested review from bensig, igorls and milla-jovovich as code owners April 19, 2026 11:52

shaibachar changed the title ~~Add heb lang~~ feat(i18n): add Hebrew language support Apr 19, 2026

shaibachar closed this Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(i18n): add Hebrew language support#1031

feat(i18n): add Hebrew language support#1031
shaibachar wants to merge 7 commits intoMemPalace:developfrom
shaibachar:add-heb-lang

shaibachar commented Apr 19, 2026

Uh oh!

mvalentsev commented Apr 19, 2026

Uh oh!

shaibachar commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shaibachar commented Apr 19, 2026

What does this PR do?

How to test

Checklist

Uh oh!

mvalentsev commented Apr 19, 2026

Uh oh!

shaibachar commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants