Skip to content

Latest commit

 

History

History
282 lines (234 loc) · 17.3 KB

File metadata and controls

282 lines (234 loc) · 17.3 KB

Changelog

All notable changes to MemPalace are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.


[3.3.1] — 2026-04-16

New Features

Multi-language entity detection — lexical patterns (person verbs, pronouns, dialogue markers, project verbs, stopwords, candidate character classes) now live in the optional entity section of each locale JSON under mempalace/i18n/<lang>.json. Every public function in entity_detector accepts a languages= tuple and unions patterns across enabled locales. Default stays ("en",) so existing English-only callers are unchanged. (#911)

  • Five new fully-supported locales with CLI strings, AAAK compression instructions, and entity-detection patterns:
    • Brazilian Portuguese pt-br (#156)
    • Russian ru (#760)
    • Italian it (#907)
    • Hindi hi (#773)
    • Indonesian id (#778)
  • MempalaceConfig.entity_languages — persistent palace-level language selection; MEMPALACE_ENTITY_LANGUAGES env override; mempalace init --lang en,pt-br flag that saves to ~/.mempalace/config.json (#911)
  • Per-language candidate_pattern — non-Latin scripts register their own character class, so names like João, Инна, राज are no longer silently dropped by the ASCII-only default (#911)
  • VSCode devcontainer matching the CI environment (#881)
  • MEMPAL_VERBOSE env toggle — developers see diaries surfaced in chat while the default remains silent (#871)
  • created_at timestamps included in search results (#846)

Bug Fixes

i18n / Unicode

  • Script-aware word boundaries for combining-mark scripts — Python's \b fails on Devanagari vowel signs (ा ी ु), Arabic, Hebrew, Thai, Tamil, Khmer etc., truncating names like अनीताअनीत and making person-verb patterns never fire. Locales now declare an optional boundary_chars field and the i18n loader expands \b into a script-aware lookaround boundary (#932)
  • Case-insensitive BCP 47 language code resolution — --lang PT-BR, zh-cn, Pt-Br previously fell through to English silently; now resolve to the canonical locale file via lowercase matching, with the entity-pattern cache keyed on the canonical form so casing variations share one cache entry (#928)
  • Wire i18n candidate patterns into miner._extract_entities_for_metadata(), palace.build_closet_lines(), and entity_registry.extract_unknown_candidates() — three code paths that still hardcoded ASCII-only [A-Z][a-z]{2,} and silently missed Cyrillic, accented Latin, and non-Latin entity metadata tags (#931)
  • Explicit encoding="utf-8" on Path.read_text() calls across entity_registry, instructions_cli, split_mega_files, and onboarding tests — prevents Windows GBK (and other non-UTF-8) locales from corrupting UTF-8 files (#946, #776)
  • ko.json status_drawers used {drawers} instead of {count}, showing the raw template string instead of the number (#758)
  • Move test_i18n.py from inside the installed package into tests/ so pytest actually collects it; remove the sys.path.insert hack (#758)
  • Dialect.from_config() defaulted to current_lang() (module-global) when config had no lang key — replaced with explicit "en" fallback for determinism (#758)

Other

  • Guard KnowledgeGraph.close() and query_relationship/timeline/stats methods with the instance lock to prevent concurrent-access corruption (#887, #884)
  • Replace invalid {"decision": "allow"} with {} in hook responses — the string wasn't a valid decision value and triggered schema warnings (#885)
  • entity_registry.research() defaults to local-only — previously made outbound Wikipedia HTTPS requests without explicit user opt-in; callers now must pass allow_network=True (#811)
  • Precompact hook no longer blocks compaction when it fails or takes too long (#856, #858, #863)
  • Redirect stdout to stderr during MCP server import so library logging can't corrupt the JSON-RPC channel (#225, #864)
  • mempalace init auto-adds per-project files to .gitignore in git repositories so users don't accidentally commit mempalace.yaml / entities.json (#185, #866)
  • Searcher guards against empty ChromaDB query results that previously raised on edge-case corpora (#195, #865)
  • Return empty status instead of an error on a cold-start palace with no drawers yet (#830, #831)
  • Restrict file permissions on sensitive palace data (#814)
  • Slack transcript importer writes a provenance header and preserves speaker IDs (#815)
  • Allow mempalace mine to run in directories without a local mempalace.yaml and surface the missing-yaml warning on stderr (#604)
  • Security hook injection fix (#812)
  • Save hook auto-mines transcripts even when MEMPAL_DIR is unset (#840)
  • Pin the Pages custom domain via a shipped CNAME in the deploy artifact (#877)
  • Version drift safeguard — sync pyproject + version.py + README badge in one place (#876)
  • Deploy docs workflow now runs on develop only, preventing accidental main-branch deploys (#845)

Improvements

  • Regex compilation optimization for entity extraction — pre-compile per-entity pattern sets once and cache by (name, languages) tuple, so multi-language callers don't thrash the cache (#880)
  • Knowledge-graph value sanitization now preserves natural punctuation (commas, colons, parentheses) that commonly appears in KG subject/object values (#873)

Documentation

  • Clarify that mempalace init requires a <dir> argument in CLI help text (#210, #862)
  • Domain name and specific impostor sites called out in the scam-alert section (#869)
  • Tightened SECURITY.md with a real version-support policy and the GHPVR-only reporting channel (#810)
  • Fixed stale pyproject.toml URLs (#853)
  • v4 planning prep (#852)

Internal

  • palace_graph tunnel helper test coverage (#908)

[3.3.0] — 2026-04-13

New Features

  • Closet layer — a compact searchable index of pointers to verbatim drawers, enabling fast topical lookup without reading all content (#788)
  • BM25 hybrid search — closets boost ranking, drawers remain the source of truth (#795, #829)
  • Entity metadata on every drawer for filterable search (#829)
  • Diary ingest — day-based rooms for conversation transcripts (#829)
  • Cross-wing tunnels — explicit links between rooms in different wings for multi-project agents (#829)
  • Drawer-grep — returns the best-matching chunk plus adjacent context drawers (#829)
  • Offline fact checker against the entity registry and knowledge graph (#829)
  • LLM-based closet regeneration — optional, bring-your-own endpoint, no mandatory API key (#793)
  • Hall detection — routes drawer content to emotions / technical / family / memory / identity / consciousness / creative halls, enabling hall-based graph connectivity within wings (#835)

Bug Fixes

  • Set hnsw:space=cosine metadata on all collection creation sites — fixes broken similarity scoring under ChromaDB's default L2 distance (#807, #218)
  • File-level locking prevents duplicate drawers when agents mine the same file concurrently (#784, #826)
  • Hybrid closet+drawer retrieval — closets boost ranking, never gate results (#795)
  • Stop hooks from making agents write in chat — saves tokens on every turn (#786)
  • Strip system tags, hook output, and Claude UI chrome from drawers before filing (#785)
  • Verbatim-safe strip_noise scoped to Claude Code JSONL only (#785)
  • Prevent diary entry ID collisions via microsecond timestamp and full content hash (#819)
  • Auto-rebuild stale drawers via NORMALIZE_VERSION schema gate
  • Enforce atomic topics in closets and extract richer pointers
  • Sync version.py to match pyproject.toml (#820)
  • Remove unused main import from mempalace/__init__.py (#827)
  • README audit — fix 7 stale claims (tool count, version badge, wake-up token cost, dialect.py lossless disclaimer, pyproject.toml version) with 42 regression-guard tests (#835)

Improvements

  • Optimize entity detection with regex caching and pre-compilation (#828)
  • Extract locked filing block into helper to keep mine_convos under C901 complexity

Documentation

  • Add docs/CLOSETS.md — closet layer overview
  • Fix stale milla-jovovich/* org URLs in website and plugin manifests (#787)
  • Fix remaining stale org URLs in contributor docs (#808)
  • Rewrite README.md and mempalaceofficial.com benchmark pages to remove category-error cross-system comparisons (R@5 retrieval recall had been listed next to competitor QA accuracy under one column), remove the retracted "+34% palace boost" claim from the surfaces where it had remained, replace the 100% Haiku-rerank headline with the honest held-out 98.4% R@5, drop the LoCoMo 100% top-50 row (retrieval-bypass artefact), and fix the broken aya-thekeeper/mempal reproduction URL (#875)
  • Add docs/HISTORY.md as the canonical home for corrections, retractions, and public notices; move the 2026-04-07 "Note from Milla & Ben" and the 2026-04-11 impostor-domain notice out of README.md
  • Add v3.3.0 reproduction result JSONLs and the deterministic seed=42 50/450 LongMemEval split under benchmarks/ — every BENCHMARKS.md claim reproduces exactly

Internal

  • Add test coverage for mine_lock, closets, entity metadata, BM25, and diary
  • Verify mine_lock via disjoint critical-section intervals
  • Serialize mine_lock concurrency test with multiprocessing
  • Make diary state path assertion platform-neutral
  • Add TestTunnels coverage for cross-wing tunnel operations
  • Ruff format with CI-pinned version (0.4.x); format mempalace/palace.py

3.2.0 — 2026-04-12

Packaging

  • Remove chromadb<0.7 upper bound — unblocks installs against chromadb 1.x palaces (#690)
  • Bump version to 3.2.0 across pyproject.toml, mempalace/version.py, README badge, and OpenClaw SKILL (#761)

Security

  • Harden palace deletion, WAL redaction, and MCP search input handling (#739)
  • Consistent input validation, argument whitelisting, concurrency safety, and WAL fixes (#647)
  • Remove hardcoded credential paths from benchmark runners (#177)
  • Remove global SSL verification bypass in convomem_bench (#176)

Bug Fixes

  • Parse Claude.ai privacy export with messages key and sender field (#685, #677)
  • Detect mtime changes in _get_client to prevent stale HNSW index (#757)
  • Hash full content in tool_add_drawer drawer ID — stable re-mines (#716)
  • Remove 10k drawer cap from status display (#707, #603)
  • Correct typo in entity_detector interactive classification prompt (#755)
  • Prevent convo_miner from re-processing 0-chunk files on every run (#732, #654)
  • Remove silent 8-line AI response truncation in convo_miner (#708, #692)
  • Store full AI response in convo_miner exchange chunking (#695)
  • Fix mine --dry-run TypeError on files with room=None (#687, #586)
  • Skip arg whitelist for handlers accepting **kwargs (#684, #572)
  • Allow Unicode in sanitize_name() — Latvian, CJK, Cyrillic (#683, #637)
  • Auto-repair BLOB seq_ids from chromadb 0.6→1.5 migration (#664)
  • Remove no-op ORT_DISABLE_COREML env var (#653, #397)
  • Disambiguate hook block reasons to name MemPalace explicitly (#666)
  • Use epsilon comparison for mtime to prevent unnecessary re-mining (#610)
  • Correct token count estimate in compress summary (#609)
  • Implement MCP ping health checks (#600)
  • Align cmd_compress dict keys with compression_stats() return values (#569)
  • Skip unreachable reparse points in detect_rooms_from_folders on Windows (#558)
  • Prevent HNSW index bloat from duplicate add() calls (#544, #525)
  • Purge stale drawers before re-mine to avoid hnswlib segfault (#544)
  • Mitigate system prompt contamination in search queries (#385, #333)
  • Count Codex user_message turns in _count_human_messages (#373, #347)
  • Paginate large collection reads and surface errors in MCP tools (#371, #339, #338)
  • Expand ~ in split command directory argument (#361)
  • Ignore wait_for_previous argument to support Gemini MCP clients (#322)
  • Close KnowledgeGraph SQLite connections in test fixtures (#450)
  • Remove duplicate cache variable declarations in mcp_server.py (#449)
  • Add --yes flag to init instructions for non-interactive use (#682, #534)
  • Add mcp command with setup guidance (#315)

New Features

  • i18n support — 8 languages (en, es, fr, de, ja, ko, zh-CN, zh-TW) (#718)
  • New MCP tools: get/list/update drawer, hook settings, export (#667, #635)
  • mempalace migrate — recover palaces from different ChromaDB versions (#502)
  • Add OpenClaw/ClawHub skill (#491)
  • Backend seam for pluggable storage backends (#413)

Improvements

  • Disable broken auto-bump workflow (#414)
  • Improve agent readiness — AGENTS.md, dependabot, CODEOWNERS, labels (#497)

Documentation

  • Add CLAUDE.md and mission/principles to AGENTS.md (#720)
  • Add VitePress documentation site (#439)
  • Add warning about fake MemPalace websites (#598)
  • Fix stale org URLs and PR branch target in contributor docs (#679)
  • Fix misaligned architecture diagram (#734, #733)
  • Add ROADMAP.md — v3.1.1 stability patch and v4.0.0-alpha plan

Internal

  • ruff format convo_miner.py (#741)
  • ruff format all Python files (#675)
  • CI: trigger tests on develop branch PRs and pushes (#674)
  • CI: fix GitHub Pages publishing (#691)

3.1.0 — 2026-04-09

Security

  • Harden inputs, fix shell injection, optimize DB access (#387)
  • Sanitize SESSION_ID in save hook to prevent path traversal (#141)
  • Sanitize error responses and remove sys.exit from library code (#139)
  • Shell injection fix in hooks, Claude Code mining, chromadb pin (#114)

Bug Fixes

  • MCP null args hang, repair infinite recursion, OOM on large files (#399)
  • Release ChromaDB handles before rmtree on Windows (#392)
  • Use os.utime in mtime test for Windows compatibility (#392)
  • Negotiate MCP protocol version instead of hardcoding (#324)
  • Use upsert and deterministic IDs to prevent data stagnation (#140)
  • Make drawer_id deterministic for idempotent writes (#387)
  • Honest AAAK stats — word-based token estimator, lossy labels (#147)
  • Room detection checks keywords against folder paths (#145)
  • Use actual detected room in mine summary stats (#165)
  • Honour --palace flag in mcp_server (#264)
  • Preserve default KG path when --palace not passed (#270)
  • --yes flag skips all interactive prompts in init (#123)
  • Repair command, split args, Claude export, room keywords (#119)
  • Replace Unicode separator in convo_miner.py for Windows compatibility (#129)
  • Coerce MCP integer arguments to native Python int (#84)
  • Batch ChromaDB reads to avoid SQLite variable limit (#66)
  • Respect nested .gitignore rules during mining (#78)
  • Narrow bare except Exception to specific types where safe (#54)
  • Mark MD5 as non-security in miner drawer ID generation (#53)
  • Remove dead code and duplicate set items in entity_registry.py (#42)
  • Silence ChromaDB telemetry warnings and CoreML segfault on Apple Silicon (#236)
  • Unify package and MCP version reporting (#16)
  • Fix broken AAAK Dialect link in README (#238)
  • Update input prompt for entity confirmation (#83)
  • Preserve CLI exit codes, log tracebacks, sanitize search errors (#139)
  • Enable SQLite WAL mode and add consistent LIMIT to KG timeline (#136)
  • Add limit=10000 safety cap to all unbounded ChromaDB .get() calls (#137)
  • Re-mine modified files, idempotent add_drawer, cleanup ChromaDB handles (#140)
  • Resolve formatting, regression logic, and pytest defaults (#270)
  • Use parse_known_args to allow importing mcp_server during pytest (#270)

New Features

  • Package MemPalace as standard Claude and Codex plugins (#270)
  • Add OpenAI Codex CLI JSONL normalizer (#61)
  • Add Codex plugin support with hooks, commands, and documentation (#270)
  • Add command documentation for help, init, mine, search, and status (#270)

Improvements

  • Cache ChromaDB PersistentClient instead of re-instantiating per call (#135)
  • Tighten chromadb version range and add py.typed marker (#142)
  • Consolidate split known-names config loading (#22)
  • CI: add separate jobs for Windows and macOS testing
  • CI: Upgrade GitHub Actions for Node 24 compatibility (#55)

Documentation

  • Add Gemini CLI setup guide and integration section (#106)
  • Add beginner-friendly hooks tutorial (#103)
  • Align MCP setup examples with shipped server (#21)
  • Honest README update — own the mistakes, fix the claims

Internal

  • Expand test coverage from 20 to 92 tests, migrate to uv (#131)
  • Add scale benchmark suite — 106 tests (#223)
  • Increase test coverage from 30% to 85%, fix Windows encoding bugs (#281)
  • Add WAL mode and entity timeline limit assertions
  • Add coverage for file_already_mined mtime check

3.0.0 — 2026-04-06

Initial public release.

  • Palace architecture with day-based rooms, drawers (verbatim), and closets (searchable index)
  • AAAK compression dialect for memory folding
  • Knowledge graph with entity detection and timeline queries
  • MCP server for Claude, Codex, and Gemini integration
  • CLI: init, mine, search, status, compress, repair, split
  • Benchmark suite with recall and scale tests
  • README with MCP flow, local model flow, and specialist agent documentation