feat: progressive disclosure + <private> tag filter by zackchiutw · Pull Request #1033 · MemPalace/mempalace

zackchiutw · 2026-04-19T13:04:05Z

Summary

Two privacy-minded additions to the MCP search and write paths:

1. `<private>...</private>` tag handling on write

tool_add_drawer and tool_diary_write now strip <private>…</private> blocks (case-insensitive, multiline, non-greedy) before sanitize + embed. Entries whose entire content sits inside a private tag are refused, so nothing leaks through metadata, drawer IDs, or the embedding itself.

2. Progressive disclosure on read

tool_search returns a ~30-char summary + drawer_id per hit by default. Callers that need the raw text opt in via full=true, or fetch on demand via tool_get_drawer (which now accepts a single ID or an array). Stored content stays verbatim — only the preview is truncated — so this respects the "Verbatim always" design principle in CLAUDE.md. The two-step flow also keeps agent context small when search is exploratory.

Why

Private tags let users mark thoughts they want remembered locally but never fed back to the model — useful for rough reasoning, personal notes, and conversation drafts. Progressive disclosure is the read-side complement: long drawer text no longer floods the agent response when it only needed to know which memory matched.

What changes

File	Change
`mempalace/privacy.py` (new)	`redact_private(text) -> (cleaned, is_fully_private)` and `summarize_for_search(text, n_chars)`
`mempalace/mcp_server.py`	`tool_search` gains `full=False` default + schema entry; `tool_get_drawer` accepts str or list[str]; `tool_add_drawer` / `tool_diary_write` strip + reject fully-private content
`mempalace/searcher.py`	Hits now include `drawer_id` pulled from chroma `ids` so the two-step fetch round-trips reliably
`tests/test_privacy.py` (new)	17 unit tests covering redaction edge cases + summary truncation
`tests/test_mcp_server.py`	New `TestProgressiveDisclosure` class (20+ integration tests)
`mempalace/instructions/search.md`	Documents the default summary response, `full=true` opt-out, and the recommended `search → get_drawer` flow
`mempalace/instructions/help.md`	Adds a Privacy section + updates the write-side tool list

Compatibility notes

Default response shape change. tool_search no longer returns full drawer text by default. Callers that depended on full text can pass full=true to restore the old behavior. LLM callers typically benefit from the new default (smaller responses, clear drawer_id for follow-up).
No stored data is altered or re-embedded. Redaction only affects new writes.
tool_get_drawer(id=...) is backwards compatible (single ID still works); new array form is additive.

Example — private tag

tool_add_drawer(
    wing="brainstorm", room="ideas",
    content="Public idea: build a MCP logger.\n<private>Skeptical — might be over-engineering.</private>"
)
# Stores and embeds only:  "Public idea: build a MCP logger."

Example — progressive disclosure

# 1. broad search returns compact previews
results = tool_search(query="auth rework", limit=3)
#  [{"drawer_id": "abc123", "text": "Switched to JWT after the session-leak incident...", ...}, ...]

# 2. fetch only the hits you actually want verbatim
tool_get_drawer(id=["abc123", "def456"])

Test plan

pytest tests/test_privacy.py tests/test_searcher.py tests/test_mcp_server.py — 112 passed
ruff check — clean
ruff format --check on modified files — clean
<private> stripping is case-insensitive (<PRIVATE>, <Private> all work)
Fully-private entry rejected with success=False + reason field
tool_get_drawer accepts both id="abc" and id=["abc", "def"]
tool_search(full=true) returns the pre-existing full-text response shape

Notes for reviewers

This PR is independent from feat: add Weibull decay and rerank pipeline for search results #1032 (rerank pipeline). They both touch mempalace/mcp_server.py:tool_search schema but at different call sites; a trivial merge conflict is expected if both land.
The summary character budget (~30 chars backing up to the last word boundary) is hard-coded in privacy.py. Happy to make that configurable if reviewers prefer.

🤖 Generated with Claude Code

Two privacy-minded additions to the search and write paths: 1. ``<private>...</private>`` tag handling on write. ``tool_add_drawer`` and ``tool_diary_write`` strip the marked blocks before sanitize + embed. Entries whose entire content is inside a private tag are refused outright so nothing leaks through metadata, drawer IDs, or the embedding itself. 2. Progressive disclosure on read. ``tool_search`` returns a ~30-char summary + ``drawer_id`` per hit by default; callers opt in to the full verbatim drawer text via ``full=true`` or fetch on demand via ``tool_get_drawer`` (now accepts a single ID or an array). Stored content stays verbatim — only the preview is truncated — so this does not violate the "verbatim always" principle. Supporting changes: - ``mempalace/privacy.py`` — ``redact_private`` + ``summarize_for_search`` - ``searcher.py`` — hits now carry ``drawer_id`` from chroma ids so the two-step flow can round-trip reliably - Full test coverage in ``tests/test_privacy.py`` and new progressive disclosure test class in ``tests/test_mcp_server.py`` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Document the new search/write semantics so LLM callers discover the two-step fetch naturally instead of re-requesting full results. - search.md: describe the default summary + drawer_id response, the ``full=true`` opt-out, and the recommended search→get_drawer flow. - help.md: add a Privacy section covering <private> tag stripping, the summary-only default, and the expanded ``mempalace_get_drawer`` (single id or array) in the write-side tool list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@zackchiutw

Scanned all 233 open upstream PRs today against our open PRs and fork-ahead / planned-work items. Findings merged into README: - P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032 (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage rerank pipeline). Older simpler version at MemPalace#337. Dropped as fork work; watching MemPalace#1032. - P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743 (@igorls) defines the plugin contract; four backend PRs already in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes, does not rebuild. - P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033 (@zackchiutw) ships adjacent privacy-tag + progressive disclosure but not the full multi-label scheme. - Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman) which adds broader mine_global_lock() + HNSW num_threads pin. Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.

zackchiutw and others added 2 commits April 19, 2026 21:03

zackchiutw requested review from bensig, igorls and milla-jovovich as code owners April 19, 2026 13:04

igorls added enhancement New feature or request area/search Search and retrieval labels Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: progressive disclosure + <private> tag filter#1033

feat: progressive disclosure + <private> tag filter#1033
zackchiutw wants to merge 2 commits intoMemPalace:developfrom
zackchiutw:feat/progressive-disclosure-private-tag

zackchiutw commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zackchiutw commented Apr 19, 2026

Summary

1. <private>...</private> tag handling on write

2. Progressive disclosure on read

Why

What changes

Compatibility notes

Example — private tag

Example — progressive disclosure

Test plan

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `<private>...</private>` tag handling on write