MemPalace

The highest-scoring AI memory system ever benchmarked. And it's free.

Every conversation you have with an AI — every decision, every debugging session, every architecture debate — disappears when the session ends. Six months of work, gone. You start over every time.

Other memory systems try to fix this by letting AI decide what's worth remembering. It extracts "user prefers Postgres" and throws away the conversation where you explained why. MemPalace takes a different approach: store everything, then make it findable.

The Palace — Ancient Greek orators memorized entire speeches by placing ideas in rooms of an imaginary building. Walk through the building, find the idea. MemPalace applies the same principle to AI memory: your conversations are organized into wings (people and projects), halls (types of memory), and rooms (specific ideas). No AI decides what matters — you keep every word, and the structure gives you a navigable map instead of a flat search index.

Raw verbatim storage — MemPalace stores your actual exchanges in ChromaDB without summarization or extraction. The 96.6% LongMemEval result comes from this raw mode. We don't burn an LLM to decide what's "worth remembering" — we keep everything and let semantic search find it.

AAAK (experimental) — A lossy abbreviation dialect for packing repeated entities into fewer tokens at scale. Readable by any LLM that reads text — Claude, GPT, Gemini, Llama, Mistral — no decoder needed. AAAK is a separate compression layer, not the storage default, and on the LongMemEval benchmark it currently regresses vs raw mode (84.2% vs 96.6%). We're iterating. See the note above for the honest status.

Local, open, adaptable — MemPalace runs entirely on your machine, on any data you have locally, without using any external API or services. It has been tested on conversations — but it can be adapted for different types of datastores. This is why we're open-sourcing it.

Quick Start · Fork Features · The Palace · AAAK Dialect · Benchmarks · MCP Tools

Highest LongMemEval score ever published — free or paid.

96.6%
_{LongMemEval R@5
raw mode, zero API calls}

500/500
_{questions tested
independently reproduced}

$0
_{No subscription
No cloud. Local only.}

_{Reproducible — runners in benchmarks/. Full results. The 96.6% is from raw verbatim mode, not AAAK or rooms mode (those score lower — see note above).}

A Note from Milla & Ben — April 7, 2026

The community caught real problems in this README within hours of launch and we want to address them directly.

What we got wrong:

The AAAK token example was incorrect. We used a rough heuristic (len(text)//3) for token counts instead of an actual tokenizer. Real counts via OpenAI's tokenizer: the English example is 66 tokens, the AAAK example is 73. AAAK does not save tokens at small scales — it's designed for repeated entities at scale, and the README example was a bad demonstration of that. We're rewriting it.

"30x lossless compression" was overstated. AAAK is a lossy abbreviation system (entity codes, sentence truncation). Independent benchmarks show AAAK mode scores 84.2% R@5 vs raw mode's 96.6% on LongMemEval — a 12.4 point regression. The honest framing is: AAAK is an experimental compression layer that trades fidelity for token density, and the 96.6% headline number is from RAW mode, not AAAK.

"+34% palace boost" was misleading. That number compares unfiltered search to wing+room metadata filtering. Metadata filtering is a standard ChromaDB feature, not a novel retrieval mechanism. Real and useful, but not a moat.

"Contradiction detection" exists as a separate utility (fact_checker.py) but is not currently wired into the knowledge graph operations as the README implied.

"100% with Haiku rerank" is real (we have the result files) but the rerank pipeline is not in the public benchmark scripts. We're adding it.

What's still true and reproducible:

96.6% R@5 on LongMemEval in raw mode, on 500 questions, zero API calls — independently reproduced on M2 Ultra in under 5 minutes by @gizmax.

Local, free, no subscription, no cloud, no data leaving your machine.

The architecture (wings, rooms, closets, drawers) is real and useful, even if it's not a magical retrieval boost.

What we're doing:

Rewriting the AAAK example with real tokenizer counts and a scenario where AAAK actually demonstrates compression

Adding mode raw / aaak / rooms clearly to the benchmark documentation so the trade-offs are visible

Wiring fact_checker.py into the KG ops so the contradiction detection claim becomes true

Pinning ChromaDB to a tested range (Issue #100), fixing the shell injection in hooks (#110), and addressing the macOS ARM64 segfault (#74)

Thank you to everyone who poked holes in this. Brutal honest criticism is exactly what makes open source work, and it's what we asked for. Special thanks to @panuhorsmalahti, @lhl, @gizmax, and everyone who filed an issue or a PR in the first 48 hours. We're listening, we're fixing, and we'd rather be right than impressive.

— Milla Jovovich & Ben Sigman

Quick Start

Install (this fork)

# One-line install — clones, installs, registers MCP server
curl -fsSL https://raw.githubusercontent.com/EndeavorYen/mempalace/main/install.sh | bash

# Or from a local clone:
git clone https://github.com/EndeavorYen/mempalace.git
cd mempalace
bash install.sh

# Update to latest
bash update.sh

# Uninstall (keeps your palace data by default)
bash uninstall.sh           # keeps ~/.mempalace/
bash uninstall.sh --purge   # removes everything

Note: pip install mempalace installs the original upstream version without multilingual support, session persistence, or token optimization. Use the install script above for the fork.

Usage

# Set up your world — who you work with, what your projects are
mempalace init ~/projects/myapp

# Mine your data
mempalace mine ~/projects/myapp                    # projects — code, docs, notes
mempalace mine ~/chats/ --mode convos              # convos — Claude, ChatGPT, Slack exports
mempalace mine ~/chats/ --mode convos --extract general  # general — classifies into decisions, milestones, problems

# Search anything you've ever discussed
mempalace search "why did we switch to GraphQL"

# Your AI remembers
mempalace status

Three mining modes: projects (code and docs), convos (conversation exports), and general (auto-classifies into decisions, preferences, milestones, problems, and emotional context). Everything stays on your machine.

What This Fork Adds

This fork extends MemPalace with multilingual support, session persistence, token optimization, and multi-harness plugin integration — features not available in the original repo.

At a Glance

Feature	Original MemPalace	This Fork
Languages	English only	8 languages tested (zh-Hans, zh-Hant, en, fr, es, de, ja, ko)
Room Classification	English keyword matching	Embedding-based semantic classification (language-agnostic)
Session Persistence	None — context lost on `/clear`	Save → clear → restore with ~1200 tokens
Knowledge Graph	Manual triples only, single-hop queries	Auto-extraction (NER + LLM), multi-hop traversal, path finding
MCP Tools	22 tools	21 tools (merged core + KG extraction/traversal/path)
Auto-Save	None	Hooks on Stop (every 15 msgs) + PreCompact
Wake-Up	Manual	Auto-injects L0+L1 on SessionStart
Plugin Support	Manual MCP setup	Claude Code + Codex plugins with marketplace install
Gemini CLI	Not supported	Documented integration with auto-save hooks
JSON Output	Pretty-printed	Compact (saves ~40% tokens per response)

Multilingual Support

The original MemPalace only works with English. This fork uses embedding-based semantic classification — new languages work with zero configuration.

Component	Original	This Fork
Entity Detection	English regex (`\b[A-Z][a-z]+\b`)	+ Chinese name patterns (百家姓 surnames, verb signals)
Memory Extraction	English-only regex markers	Embedding-based semantic classification
Spellcheck	English only, corrupts CJK	Auto-skips non-English text
Embedding Model	English-only (`all-MiniLM-L6-v2`)	Multilingual (`paraphrase-multilingual-MiniLM-L12-v2`)
Simplified/Traditional	N/A	Both supported, OpenCC-validated consistency

Benchmark — 100% (Grade A) across 8 languages, 173 test cases:

Base Benchmark (122 cases)           Extended Benchmark (51 cases)
──────────────────────────           ─────────────────────────────
Language Detection   100%  38/38     Long-Form Room Class.  100%  17/17
Entity Detection     100%  10/10     Complex Entity Det.    100%   6/6
Room Classification  100%  31/31     Deep Memory Extraction 100%   9/9
Memory Extraction    100%  11/11     Cross-Language Search  100%  10/10
Search Quality       100%  14/14     Robustness             100%   9/9
OpenCC Consistency   100%  18/18
──────────────────────────           ─────────────────────────────
OVERALL              100% 122/122    OVERALL                100%  51/51

# French — zero config, works via semantic similarity
detect_convo_room("Le code a un bug dans la base de données.")  # → "technical"

# Japanese — zero config
extract_memories("PostgreSQLに移行することにしました。")  # → [decision]

# Install with multilingual support
pip install -e ".[multilingual]"

# Search in any language
mempalace search "数据库架构设计"
mempalace search "database architecture"

The multilingual embedding model (~120MB) downloads automatically on first use. No GPU required — runs on CPU in milliseconds.

_{Run the benchmarks yourself: python -m benchmarks.multilingual_benchmark --verbose and python -m benchmarks.multilingual_benchmark_extended --verbose}

Session Checkpoint / Restore

Long AI sessions hit context limits. The standard fix — /clear — wipes everything. This fork adds a save → clear → restore cycle that preserves continuity at ~1200 tokens:

/save                  → saves current task, progress, decisions, memory triggers
/clear                 → wipes context window
/restore               → restores state + wake-up + recent checkpoints (~1200 tokens)

Three MCP tools power this:

Tool	What
`session_checkpoint`	Save state.md + diary entry before `/clear`
`session_restore`	Restore state + L0/L1 wake-up + recent checkpoints
`session_list`	List projects with saved checkpoints

State is stored as a deterministic file (~/.mempalace/sessions/{project}/state.md) — not in ChromaDB — so restore is exact, not semantic.

Token Optimization

Every MCP tool definition costs tokens in the AI's context window. This fork reduces overhead at three layers:

Tool consolidation (22 → 18 tools) — overlapping tools merged:

list_wings + list_rooms + get_taxonomy → mempalace_taxonomy
graph_stats + kg_stats → folded into mempalace_status
Old tool names return helpful redirect messages via DEPRECATED_TOOLS map

Compact JSON output — responses use single-line JSON instead of pretty-printed, saving ~40% tokens per tool response. Set MEMPALACE_DEBUG=1 for readable output during development.

Search truncation — snippet_len=200 default trims drawer content in search results. Full content still retrievable by ID.

Auto-Wake-Up & Auto-Save Hooks

SessionStart hook — automatically injects L0 (identity) + L1 (essential story) into the conversation when it starts. No manual mempalace wake-up needed.

Stop hook — every 15 exchanges, triggers a structured save: topics, decisions, quotes, code changes. Also regenerates the critical facts layer.

PreCompact hook — fires before context compression. Emergency save before the window shrinks.

{
  "hooks": {
    "SessionStart": [{"matcher": "", "hooks": [{"type": "command", "command": "mempal-session-start-hook.sh"}]}],
    "Stop": [{"matcher": "", "hooks": [{"type": "command", "command": "mempal_save_hook.sh"}]}],
    "PreCompact": [{"matcher": "", "hooks": [{"type": "command", "command": "mempal_precompact_hook.sh"}]}]
  }
}

Optional auto-ingest: Set the MEMPAL_DIR environment variable to a directory path and the hooks will automatically run mempalace mine on that directory during each save trigger (background on stop, synchronous on precompact).

All hooks are pre-configured when installed as a plugin — no manual setup required.

Plugin Support (Claude Code, Codex, Gemini CLI)

Claude Code — native marketplace install:

claude plugin marketplace add milla-jovovich/mempalace
claude plugin install --scope user mempalace

Includes hooks (SessionStart, Stop, PreCompact), slash commands (/save, /restore, /save-clear), and auto-pip-install on first load.

Codex CLI — parallel plugin with dedicated hooks and marketplace metadata in .codex-plugin/.

Gemini CLI — documented integration guide with MCP registration and PreCompress hook. See examples/gemini_cli_setup.md.

How You Actually Use It

After the one-time setup (install → init → mine), you don't run MemPalace commands manually. Your AI uses it for you. There are two ways, depending on which AI you use.

With Claude Code (recommended)

Native marketplace install:

claude plugin marketplace add milla-jovovich/mempalace
claude plugin install --scope user mempalace

Restart Claude Code, then type /skills to verify "mempalace" appears.

With Claude, ChatGPT, Cursor, Gemini (MCP-compatible tools)

# Connect MemPalace once
claude mcp add mempalace -- python -m mempalace.mcp_server

Now your AI has 19 tools available through MCP. Ask it anything:

"What did we decide about auth last month?"

Claude calls mempalace_search automatically, gets verbatim results, and answers you. You never type mempalace search again. The AI handles it.

MemPalace also works natively with Gemini CLI (which handles the server and save hooks automatically) — see the Gemini CLI Integration Guide.

With local models (Llama, Mistral, or any offline LLM)

Local models generally don't speak MCP yet. Two approaches:

1. Wake-up command — load your world into the model's context:

mempalace wake-up > context.txt
# Paste context.txt into your local model's system prompt

This gives your local model ~170 tokens of critical facts (in AAAK if you prefer) before you ask a single question.

2. CLI search — query on demand, feed results into your prompt:

mempalace search "auth decisions" > results.txt
# Include results.txt in your prompt

Or use the Python API:

from mempalace.searcher import search_memories
results = search_memories("auth decisions", palace_path="~/.mempalace/palace")
# Inject into your local model's context

Either way — your entire memory stack runs offline. ChromaDB on your machine, Llama on your machine, AAAK for compression, zero cloud calls.

The Problem

Decisions happen in conversations now. Not in docs. Not in Jira. In conversations with Claude, ChatGPT, Copilot. The reasoning, the tradeoffs, the "we tried X and it failed because Y" — all trapped in chat windows that evaporate when the session ends.

Six months of daily AI use = 19.5 million tokens. That's every decision, every debugging session, every architecture debate. Gone.

Approach	Tokens loaded	Annual cost
Paste everything	19.5M — doesn't fit any context window	Impossible
LLM summaries	~650K	~$507/yr
MemPalace wake-up	~170 tokens	~$0.70/yr
MemPalace + 5 searches	~13,500 tokens	~$10/yr

MemPalace loads 170 tokens of critical facts on wake-up — your team, your projects, your preferences. Then searches only when needed. $10/year to remember everything vs $507/year for summaries that lose context.

How It Works

The Palace

The layout is fairly simple, though it took a long time to get there.

It starts with a wing. Every project, person, or topic you're filing gets its own wing in the palace.

Each wing has rooms connected to it, where information is divided into subjects that relate to that wing — so every room is a different element of what your project contains. Project ideas could be one room, employees could be another, financial statements another. There can be an endless number of rooms that split the wing into sections. The MemPalace install detects these for you automatically, and of course you can personalize it any way you feel is right.

Every room has a closet connected to it, and here's where things get interesting. We've developed an AI language called AAAK. Don't ask — it's a whole story of its own. Your agent learns the AAAK shorthand every time it wakes up. Because AAAK is essentially English, but a very truncated version, your agent understands how to use it in seconds. It comes as part of the install, built into the MemPalace code. In our next update, we'll add AAAK directly to the closets, which will be a real game changer — the amount of info in the closets will be much bigger, but it will take up far less space and far less reading time for your agent.

Inside those closets are drawers, and those drawers are where your original files live. In this first version, we haven't used AAAK as a closet tool, but even so, the summaries have shown 96.6% recall in all the benchmarks we've done across multiple benchmarking platforms. Once the closets use AAAK, searches will be even faster while keeping every word exact. But even now, the closet approach has been a huge boon to how much info is stored in a small space — it's used to easily point your AI agent to the drawer where your original file lives. You never lose anything, and all this happens in seconds.

There are also halls, which connect rooms within a wing, and tunnels, which connect rooms from different wings to one another. So finding things becomes truly effortless — we've given the AI a clean and organized way to know where to start searching, without having to look through every keyword in huge folders.

You say what you're looking for and boom, it already knows which wing to go to. Just that in itself would have made a big difference. But this is beautiful, elegant, organic, and most importantly, efficient.

  ┌─────────────────────────────────────────────────────────────┐
  │  WING: Person                                              │
  │                                                            │
  │    ┌──────────┐  ──hall──  ┌──────────┐                    │
  │    │  Room A  │            │  Room B  │                    │
  │    └────┬─────┘            └──────────┘                    │
  │         │                                                  │
  │         ▼                                                  │
  │    ┌──────────┐      ┌──────────┐                          │
  │    │  Closet  │ ───▶ │  Drawer  │                          │
  │    └──────────┘      └──────────┘                          │
  └─────────┼──────────────────────────────────────────────────┘
            │
          tunnel
            │
  ┌─────────┼──────────────────────────────────────────────────┐
  │  WING: Project                                             │
  │         │                                                  │
  │    ┌────┴─────┐  ──hall──  ┌──────────┐                    │
  │    │  Room A  │            │  Room C  │                    │
  │    └────┬─────┘            └──────────┘                    │
  │         │                                                  │
  │         ▼                                                  │
  │    ┌──────────┐      ┌──────────┐                          │
  │    │  Closet  │ ───▶ │  Drawer  │                          │
  │    └──────────┘      └──────────┘                          │
  └─────────────────────────────────────────────────────────────┘

Wings — a person or project. As many as you need. Rooms — specific topics within a wing. Auth, billing, deploy — endless rooms. Halls — connections between related rooms within the same wing. If Room A (auth) and Room B (security) are related, a hall links them. Tunnels — connections between wings. When Person A and a Project both have a room about "auth," a tunnel cross-references them automatically. Closets — summaries that point to the original content. (In v3.0.0 these are plain-text summaries; AAAK-encoded closets are coming in a future update — see Task #30.) Drawers — the original verbatim files. The exact words, never summarized.

Halls are memory types — the same in every wing, acting as corridors:

hall_facts — decisions made, choices locked in
hall_events — sessions, milestones, debugging
hall_discoveries — breakthroughs, new insights
hall_preferences — habits, likes, opinions
hall_advice — recommendations and solutions

Rooms are named ideas — auth-migration, graphql-switch, ci-pipeline. When the same room appears in different wings, it creates a tunnel — connecting the same topic across domains:

wing_kai       / hall_events / auth-migration  → "Kai debugged the OAuth token refresh"
wing_driftwood / hall_facts  / auth-migration  → "team decided to migrate auth to Clerk"
wing_priya     / hall_advice / auth-migration  → "Priya approved Clerk over Auth0"

Same room. Three wings. The tunnel connects them.

Why Structure Matters

Tested on 22,000+ real conversation memories:

Search all closets:          60.9%  R@10
Search within wing:          73.1%  (+12%)
Search wing + hall:          84.8%  (+24%)
Search wing + room:          94.8%  (+34%)

Wings and rooms aren't cosmetic. They're a 34% retrieval improvement. The palace structure is the product.

The Memory Stack

Layer	What	Size	When
L0	Identity — who is this AI?	~50 tokens	Always loaded
L1	Critical facts — team, projects, preferences	~120 tokens (AAAK)	Always loaded
L2	Room recall — recent sessions, current project	On demand	When topic comes up
L3	Deep search — semantic query across all closets	On demand	When explicitly asked

Your AI wakes up with L0 + L1 (~170 tokens) and knows your world. Searches only fire when needed.

AAAK Dialect (experimental)

AAAK is a lossy abbreviation system — entity codes, structural markers, and sentence truncation — designed to pack repeated entities and relationships into fewer tokens at scale. It is readable by any LLM that reads text (Claude, GPT, Gemini, Llama, Mistral) without a decoder, so a local model can use it without any cloud dependency.

Honest status (April 2026):

AAAK is lossy, not lossless. It uses regex-based abbreviation, not reversible compression.
It does not save tokens at small scales. Short text already tokenizes efficiently. AAAK overhead (codes, separators) costs more than it saves on a few sentences.
It can save tokens at scale — in scenarios with many repeated entities (a team mentioned hundreds of times, the same project across thousands of sessions), the entity codes amortize.
AAAK currently regresses LongMemEval vs raw verbatim retrieval (84.2% R@5 vs 96.6%). The 96.6% headline number is from raw mode, not AAAK mode.
The MemPalace storage default is raw verbatim text in ChromaDB — that's where the benchmark wins come from. AAAK is a separate compression layer for context loading, not the storage format.

We're iterating on the dialect spec, adding a real tokenizer for stats, and exploring better break points for when to use it. Track progress in Issue #43 and #27.

Contradiction Detection (experimental, not yet wired into KG)

A separate utility (fact_checker.py) can check assertions against entity facts. It's not currently called automatically by the knowledge graph operations — this is being fixed (track in Issue #27). When enabled it catches things like:

Input:  "Soren finished the auth migration"
Output: 🔴 AUTH-MIGRATION: attribution conflict — Maya was assigned, not Soren

Input:  "Kai has been here 2 years"
Output: 🟡 KAI: wrong_tenure — records show 3 years (started 2023-04)

Input:  "The sprint ends Friday"
Output: 🟡 SPRINT: stale_date — current sprint ends Thursday (updated 2 days ago)

Facts checked against the knowledge graph. Ages, dates, and tenures calculated dynamically — not hardcoded.

Real-World Examples

Solo developer across multiple projects

# Mine each project's conversations
mempalace mine ~/chats/orion/  --mode convos --wing orion
mempalace mine ~/chats/nova/   --mode convos --wing nova
mempalace mine ~/chats/helios/ --mode convos --wing helios

# Six months later: "why did I use Postgres here?"
mempalace search "database decision" --wing orion
# → "Chose Postgres over SQLite because Orion needs concurrent writes
#    and the dataset will exceed 10GB. Decided 2025-11-03."

# Cross-project search
mempalace search "rate limiting approach"
# → finds your approach in Orion AND Nova, shows the differences

Team lead managing a product

# Mine Slack exports and AI conversations
mempalace mine ~/exports/slack/ --mode convos --wing driftwood
mempalace mine ~/.claude/projects/ --mode convos

# "What did Soren work on last sprint?"
mempalace search "Soren sprint" --wing driftwood
# → 14 closets: OAuth refactor, dark mode, component library migration

# "Who decided to use Clerk?"
mempalace search "Clerk decision" --wing driftwood
# → "Kai recommended Clerk over Auth0 — pricing + developer experience.
#    Team agreed 2026-01-15. Maya handling the migration."

Before mining: split mega-files

Some transcript exports concatenate multiple sessions into one huge file:

mempalace split ~/chats/                      # split into per-session files
mempalace split ~/chats/ --dry-run            # preview first
mempalace split ~/chats/ --min-sessions 3     # only split files with 3+ sessions

Knowledge Graph

Temporal entity-relationship triples — like Zep's Graphiti, but SQLite instead of Neo4j. Local and free.

from mempalace.knowledge_graph import KnowledgeGraph

kg = KnowledgeGraph()
kg.add_triple("Kai", "works_on", "Orion", valid_from="2025-06-01")
kg.add_triple("Maya", "assigned_to", "auth-migration", valid_from="2026-01-15")
kg.add_triple("Maya", "completed", "auth-migration", valid_from="2026-02-01")

# What's Kai working on?
kg.query_entity("Kai")
# → [Kai → works_on → Orion (current), Kai → recommended → Clerk (2026-01)]

# What was true in January?
kg.query_entity("Maya", as_of="2026-01-20")
# → [Maya → assigned_to → auth-migration (active)]

# Timeline
kg.timeline("Orion")
# → chronological story of the project

Facts have validity windows. When something stops being true, invalidate it:

kg.invalidate("Kai", "works_on", "Orion", ended="2026-03-01")

Now queries for Kai's current work won't return Orion. Historical queries still will.

Auto-Extraction

Feed text to the KG and let it extract entities and relationships automatically — no manual triple construction needed.

from mempalace.kg_extraction import EntityTripleExtractor

extractor = EntityTripleExtractor(kg)
extractor.extract("Alice Chen joined Acme Corp as a senior engineer. She works with Bob on the payments team.")
# → Entities: Alice Chen (person), Acme Corp (org), Bob (person)
# → Triples: Alice Chen→joined→Acme Corp, Alice Chen→works_with→Bob

Zero-cost baseline: Uses spaCy NER or regex fallback — no API key needed
LLM upgrade: Set ANTHROPIC_API_KEY and extraction automatically upgrades to Claude Haiku for semantic triples
Ingest integration: Extraction runs automatically when conversations are ingested via mine_convos

Install NER support: pip install mempalace[nlp] && python -m spacy download en_core_web_sm

Multi-Hop Traversal

Walk the graph beyond direct neighbors. Find how entities connect across multiple hops.

# Discover everything within 2 hops of Alice
kg.traverse("Alice", depth=2)
# → nodes at depth 0 (Alice), depth 1 (Acme, Bob), depth 2 (payments team, ...)

# How are Alice and Carol connected?
kg.find_path("Alice", "Carol")
# → Alice → works_at → Acme ← works_at ← Carol (length: 2)

BFS traversal with depth cap (max 3 hops)
Shortest path finding between any two entities
Temporal filtering (as_of) and confidence thresholds on both

Feature	MemPalace	Zep (Graphiti)
Storage	SQLite (local)	Neo4j (cloud)
Cost	Free	$25/mo+
Temporal validity	Yes	Yes
Auto-extraction	NER + optional LLM	LLM required
Multi-hop traversal	BFS (depth 1-3)	BFS + community detection
Entity resolution	Planned	LLM-based
Self-hosted	Always	Enterprise only
Privacy	Everything local	SOC 2, HIPAA

Specialist Agents

Create agents that focus on specific areas. Each agent gets its own wing and diary in the palace — not in your CLAUDE.md. Add 50 agents, your config stays the same size.

~/.mempalace/agents/
  ├── reviewer.json       # code quality, patterns, bugs
  ├── architect.json      # design decisions, tradeoffs
  └── ops.json            # deploys, incidents, infra

Your CLAUDE.md just needs one line:

You have MemPalace agents. Run mempalace_list_agents to see them.

The AI discovers its agents from the palace at runtime. Each agent:

Has a focus — what it pays attention to
Keeps a diary — written in AAAK, persists across sessions
Builds expertise — reads its own history to stay sharp in its domain

# Agent writes to its diary after a code review
mempalace_diary_write("reviewer",
    "PR#42|auth.bypass.found|missing.middleware.check|pattern:3rd.time.this.quarter|★★★★")

# Agent reads back its history
mempalace_diary_read("reviewer", last_n=10)
# → last 10 findings, compressed in AAAK

Each agent is a specialist lens on your data. The reviewer remembers every bug pattern it's seen. The architect remembers every design decision. The ops agent remembers every incident. They don't share a scratchpad — they each maintain their own memory.

Letta charges $20–200/mo for agent-managed memory. MemPalace does it with a wing.

MCP Server

# Via plugin (recommended)
claude plugin marketplace add milla-jovovich/mempalace
claude plugin install --scope user mempalace

# Or manually
claude mcp add mempalace -- python -m mempalace.mcp_server

21 Tools

Palace (read)

Tool	What
`mempalace_status`	Palace overview + graph stats + KG stats
`mempalace_taxonomy`	Full wing → room → count tree (merged: list_wings + list_rooms + get_taxonomy)
`mempalace_search`	Semantic search with wing/room filters and `snippet_len` truncation
`mempalace_check_duplicate`	Check before filing
`mempalace_get_aaak_spec`	AAAK dialect reference

Palace (write)

Tool	What
`mempalace_add_drawer`	File verbatim content
`mempalace_delete_drawer`	Remove by ID

Knowledge Graph

Tool	What
`mempalace_kg_query`	Entity relationships with temporal `as_of` filtering
`mempalace_kg_add`	Add facts with optional validity window
`mempalace_kg_invalidate`	Mark facts as ended
`mempalace_kg_timeline`	Chronological entity story
`mempalace_kg_extract`	Auto-extract entities and triples from text (NER + optional LLM)
`mempalace_kg_traverse`	Multi-hop BFS traversal from an entity (depth 1-3)
`mempalace_kg_find_path`	Shortest path between two entities

Navigation

Tool	What
`mempalace_traverse`	Walk the graph from a room across wings
`mempalace_find_tunnels`	Find rooms bridging two wings

Session

Tool	What
`session_checkpoint`	Save state before `/clear`
`session_restore`	Restore state + wake-up after `/clear`
`session_list`	List projects with saved checkpoints

Agent Diary

Tool	What
`mempalace_diary_write`	Write AAAK diary entry
`mempalace_diary_read`	Read recent diary entries

The AI learns AAAK and the memory protocol automatically from the mempalace_status response. No manual configuration.

Benchmarks

Tested on standard academic benchmarks — reproducible, published datasets.

Benchmark	Mode	Score	API Calls
LongMemEval R@5	Raw (ChromaDB only)	96.6%	Zero
LongMemEval R@5	Hybrid + Haiku rerank	100% (500/500)	~500
LoCoMo R@10	Raw, session level	60.3%	Zero
Personal palace R@10	Heuristic bench	85%	Zero
Palace structure impact	Wing+room filtering	+34% R@10	Zero

The 96.6% raw score is the highest published LongMemEval result requiring no API key, no cloud, and no LLM at any stage.

vs Published Systems

System	LongMemEval R@5	API Required	Cost
MemPalace (hybrid)	100%	Optional	Free
Supermemory ASMR	~99%	Yes	—
MemPalace (raw)	96.6%	None	Free
Mastra	94.87%	Yes (GPT)	API costs
Mem0	~85%	Yes	$19–249/mo
Zep	~85%	Yes	$25/mo+

All Commands

# Setup
mempalace init <dir>                              # guided onboarding + AAAK bootstrap

# Mining
mempalace mine <dir>                              # mine project files
mempalace mine <dir> --mode convos                # mine conversation exports
mempalace mine <dir> --mode convos --wing myapp   # tag with a wing name

# Splitting
mempalace split <dir>                             # split concatenated transcripts
mempalace split <dir> --dry-run                   # preview

# Search
mempalace search "query"                          # search everything
mempalace search "query" --wing myapp             # within a wing
mempalace search "query" --room auth-migration    # within a room

# Memory stack
mempalace wake-up                                 # load L0 + L1 context
mempalace wake-up --wing driftwood                # project-specific

# Compression
mempalace compress --wing myapp                   # AAAK compress

# Status
mempalace status                                  # palace overview

# MCP
mempalace mcp                                     # show MCP setup command

All commands accept --palace <path> to override the default location.

Configuration

Global (`~/.mempalace/config.json`)

{
  "palace_path": "/custom/path/to/palace",
  "collection_name": "mempalace_drawers",
  "people_map": {"Kai": "KAI", "Priya": "PRI"}
}

Wing config (`~/.mempalace/wing_config.json`)

Generated by mempalace init. Maps your people and projects to wings:

{
  "default_wing": "wing_general",
  "wings": {
    "wing_kai": {"type": "person", "keywords": ["kai", "kai's"]},
    "wing_driftwood": {"type": "project", "keywords": ["driftwood", "analytics", "saas"]}
  }
}

Identity (`~/.mempalace/identity.txt`)

Plain text. Becomes Layer 0 — loaded every session.

File Reference

File	What
`cli.py`	CLI entry point
`config.py`	Configuration loading and defaults
`normalize.py`	Converts 5 chat formats to standard transcript
`mcp_server.py`	MCP server — 19 tools, AAAK auto-teach, memory protocol
`miner.py`	Project file ingest
`convo_miner.py`	Conversation ingest — chunks by exchange pair
`searcher.py`	Semantic search via ChromaDB
`layers.py`	4-layer memory stack
`dialect.py`	AAAK compression — 30x lossless
`knowledge_graph.py`	Temporal entity-relationship graph (SQLite)
`palace_graph.py`	Room-based navigation graph
`onboarding.py`	Guided setup — generates AAAK bootstrap + wing config
`entity_registry.py`	Entity code registry
`entity_detector.py`	Auto-detect people and projects from content
`split_mega_files.py`	Split concatenated transcripts into per-session files
`hooks/mempal_save_hook.sh`	Auto-save every N messages
`hooks/mempal_precompact_hook.sh`	Emergency save before compaction

Project Structure

mempalace/
├── README.md                  ← you are here
├── mempalace/                 ← core package (README)
│   ├── cli.py                 ← CLI entry point
│   ├── mcp_server.py          ← MCP server (19 tools)
│   ├── knowledge_graph.py     ← temporal entity graph
│   ├── palace_graph.py        ← room navigation graph
│   ├── dialect.py             ← AAAK compression
│   ├── miner.py               ← project file ingest
│   ├── convo_miner.py         ← conversation ingest
│   ├── searcher.py            ← semantic search
│   ├── onboarding.py          ← guided setup
│   └── ...                    ← see mempalace/README.md
├── benchmarks/                ← reproducible benchmark runners
│   ├── README.md              ← reproduction guide
│   ├── BENCHMARKS.md          ← full results + methodology
│   ├── longmemeval_bench.py   ← LongMemEval runner
│   ├── locomo_bench.py        ← LoCoMo runner
│   └── membench_bench.py      ← MemBench runner
├── hooks/                     ← Claude Code auto-save hooks
│   ├── README.md              ← hook setup guide
│   ├── mempal_save_hook.sh    ← save every N messages
│   └── mempal_precompact_hook.sh ← emergency save
├── examples/                  ← usage examples
│   ├── basic_mining.py
│   ├── convo_import.py
│   └── mcp_setup.md
├── tests/                     ← test suite (README)
├── assets/                    ← logo + brand assets
└── pyproject.toml             ← package config (v3.0.0)

Requirements

Python 3.9+
chromadb>=0.4.0
pyyaml>=6.0

No API key. No internet after install. Everything local.

pip install mempalace

Contributing

PRs welcome. See CONTRIBUTING.md for setup and guidelines.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
.agents/plugins		.agents/plugins
.claude-plugin		.claude-plugin
.claude/commands		.claude/commands
.codex-plugin		.codex-plugin
.github		.github
assets		assets
benchmarks		benchmarks
examples		examples
hooks		hooks
mempalace		mempalace
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
uninstall.sh		uninstall.sh
update.sh		update.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

MemPalace

The highest-scoring AI memory system ever benchmarked. And it's free.

Highest LongMemEval score ever published — free or paid.

A Note from Milla & Ben — April 7, 2026

Quick Start

Install (this fork)

Usage

What This Fork Adds

At a Glance

Multilingual Support

Session Checkpoint / Restore

Token Optimization

Auto-Wake-Up & Auto-Save Hooks

Plugin Support (Claude Code, Codex, Gemini CLI)

How You Actually Use It

With Claude Code (recommended)

With Claude, ChatGPT, Cursor, Gemini (MCP-compatible tools)

With local models (Llama, Mistral, or any offline LLM)

The Problem

How It Works

The Palace

Why Structure Matters

The Memory Stack

AAAK Dialect (experimental)

Contradiction Detection (experimental, not yet wired into KG)

Real-World Examples

Solo developer across multiple projects

Team lead managing a product

Before mining: split mega-files

Knowledge Graph

Auto-Extraction

Multi-Hop Traversal

Specialist Agents

MCP Server

21 Tools

Benchmarks

vs Published Systems

All Commands

Configuration

Global (~/.mempalace/config.json)

Wing config (~/.mempalace/wing_config.json)

Identity (~/.mempalace/identity.txt)

File Reference

Project Structure

Requirements

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Global (`~/.mempalace/config.json`)

Wing config (`~/.mempalace/wing_config.json`)

Identity (`~/.mempalace/identity.txt`)

Packages