Skip to content

feat(i18n): add Vietnamese language support#1059

Closed
TanNhatCMS wants to merge 4 commits intoMemPalace:developfrom
TanNhatCMS:feat/i18n-vietnamese
Closed

feat(i18n): add Vietnamese language support#1059
TanNhatCMS wants to merge 4 commits intoMemPalace:developfrom
TanNhatCMS:feat/i18n-vietnamese

Conversation

@TanNhatCMS
Copy link
Copy Markdown

What does this PR do?

  • Fixes Vietnamese i18n direct-address test regression by restoring compatibility for both keys:
    • direct_address_pattern (legacy/test usage)
    • direct_address_patterns (current internal usage)
  • Expands Vietnamese i18n coverage in existing test file tests/test_i18n.py:
    • EN/VI schema parity checks
    • CLI placeholder parity checks
    • Regex compile + sample matching checks
    • Multi-word Vietnamese entity matching
    • Person/pronoun/project/stopword signal checks
  • Keeps language-case behavior coverage in tests/test_i18n_lang_case.py intact.

How to test

. .\.venv\Scripts\Activate.ps1
python -m pytest tests/ -v
ruff check .

Expected result from latest run:

  • python -m pytest tests/ -v -> 1044 passed, 1 skipped, 106 deselected
  • ruff check . -> All checks passed!

Checklist

  • Tests pass (python -m pytest tests/ -v)
  • No hardcoded paths
  • Linter passes (ruff check .)

@igorls
Copy link
Copy Markdown
Member

igorls commented Apr 21, 2026

Thanks for the Vietnamese locale — vi.json itself looks reasonable, but I need the PR scope trimmed before I can merge. Right now it touches 13 files with 465 additions, and most of that isn't Vietnamese-related. Please drop:

  1. Unrelated ruff-reformat churn in 9 files — backends/chroma.py, tests/test_closet_llm.py, tests/test_closets.py, tests/test_convo_miner.py, tests/test_mcp_server.py, tests/test_mcp_stdio_protection.py, tests/test_normalize.py, tests/test_readme_claims.py, tests/test_sweeper.py. Looks like a newer ruff version reformatted them locally. Drop these from this PR; if you want the reformat merged, open a separate PR for it.

  2. API surface change in mempalace/i18n/__init__.py — the PR adds direct_address_pattern (singular) as an alias of direct_address_patterns (plural) in the merged output dict. The loader has only ever emitted the plural key in output; the singular belongs to the JSON input schema. Your Vietnamese test handles both isinstance(p, str) and isinstance(p, re.Pattern) branches, which suggests the test was written against the wrong key. Please fix the test to use direct_address_patterns instead of adding the alias — the alias would also conflict with the schema-invariant test added in feat(i18n): add entity detection to German, Spanish, and French locales #1001.

  3. vi.json end-of-file newline — missing.

  4. Prune multi-word entries from regex.stop_words — many Vietnamese particles in the list are multi-word (cái gì, cái nào, người ta, etc.). The tokenizer splits on whitespace (\w{2,}), so space-containing entries never fire. Keep single-word tokens only. (See feat(searcher): wire i18n stop words into BM25 tokenizer (#973) #977 for the same issue fixed on ja / zh-CN.)

Once this is scoped to mempalace/i18n/vi.json + test additions in tests/test_i18n.py only, it'll be quick to review and merge.

@TanNhatCMS TanNhatCMS closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants