feat(i18n): add Vietnamese language support by TanNhatCMS · Pull Request #1059 · MemPalace/mempalace

TanNhatCMS · 2026-04-21T03:48:16Z

What does this PR do?

Fixes Vietnamese i18n direct-address test regression by restoring compatibility for both keys:
- direct_address_pattern (legacy/test usage)
- direct_address_patterns (current internal usage)
Expands Vietnamese i18n coverage in existing test file tests/test_i18n.py:
- EN/VI schema parity checks
- CLI placeholder parity checks
- Regex compile + sample matching checks
- Multi-word Vietnamese entity matching
- Person/pronoun/project/stopword signal checks
Keeps language-case behavior coverage in tests/test_i18n_lang_case.py intact.

How to test

. .\.venv\Scripts\Activate.ps1
python -m pytest tests/ -v
ruff check .

Expected result from latest run:

python -m pytest tests/ -v -> 1044 passed, 1 skipped, 106 deselected
ruff check . -> All checks passed!

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

igorls · 2026-04-21T03:57:43Z

Thanks for the Vietnamese locale — vi.json itself looks reasonable, but I need the PR scope trimmed before I can merge. Right now it touches 13 files with 465 additions, and most of that isn't Vietnamese-related. Please drop:

Unrelated ruff-reformat churn in 9 files — backends/chroma.py, tests/test_closet_llm.py, tests/test_closets.py, tests/test_convo_miner.py, tests/test_mcp_server.py, tests/test_mcp_stdio_protection.py, tests/test_normalize.py, tests/test_readme_claims.py, tests/test_sweeper.py. Looks like a newer ruff version reformatted them locally. Drop these from this PR; if you want the reformat merged, open a separate PR for it.
API surface change in mempalace/i18n/__init__.py — the PR adds direct_address_pattern (singular) as an alias of direct_address_patterns (plural) in the merged output dict. The loader has only ever emitted the plural key in output; the singular belongs to the JSON input schema. Your Vietnamese test handles both isinstance(p, str) and isinstance(p, re.Pattern) branches, which suggests the test was written against the wrong key. Please fix the test to use direct_address_patterns instead of adding the alias — the alias would also conflict with the schema-invariant test added in feat(i18n): add entity detection to German, Spanish, and French locales #1001.
vi.json end-of-file newline — missing.
Prune multi-word entries from regex.stop_words — many Vietnamese particles in the list are multi-word (cái gì, cái nào, người ta, etc.). The tokenizer splits on whitespace (\w{2,}), so space-containing entries never fire. Keep single-word tokens only. (See feat(searcher): wire i18n stop words into BM25 tokenizer (#973) #977 for the same issue fixed on ja / zh-CN.)

Once this is scoped to mempalace/i18n/vi.json + test additions in tests/test_i18n.py only, it'll be quick to review and merge.

TanNhatCMS added 4 commits April 21, 2026 10:15

feat(i18n): add Vietnamese language support

195007d

feat(i18n): add Vietnamese language support

3b5fc1a

feat(i18n): add Vietnamese language support

1315c38

chore: format with ruff

b895287

TanNhatCMS requested review from bensig, igorls and milla-jovovich as code owners April 21, 2026 03:48

TanNhatCMS closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(i18n): add Vietnamese language support#1059

feat(i18n): add Vietnamese language support#1059
TanNhatCMS wants to merge 4 commits intoMemPalace:developfrom
TanNhatCMS:feat/i18n-vietnamese

TanNhatCMS commented Apr 21, 2026

Uh oh!

igorls commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TanNhatCMS commented Apr 21, 2026

What does this PR do?

How to test

Checklist

Uh oh!

igorls commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

igorls commented Apr 21, 2026 •

edited

Loading