Skip to content

feat: progressive disclosure + <private> tag filter#1033

Open
zackchiutw wants to merge 2 commits intoMemPalace:developfrom
zackchiutw:feat/progressive-disclosure-private-tag
Open

feat: progressive disclosure + <private> tag filter#1033
zackchiutw wants to merge 2 commits intoMemPalace:developfrom
zackchiutw:feat/progressive-disclosure-private-tag

Conversation

@zackchiutw
Copy link
Copy Markdown

Summary

Two privacy-minded additions to the MCP search and write paths:

1. <private>...</private> tag handling on write

tool_add_drawer and tool_diary_write now strip <private>…</private> blocks (case-insensitive, multiline, non-greedy) before sanitize + embed. Entries whose entire content sits inside a private tag are refused, so nothing leaks through metadata, drawer IDs, or the embedding itself.

2. Progressive disclosure on read

tool_search returns a ~30-char summary + drawer_id per hit by default. Callers that need the raw text opt in via full=true, or fetch on demand via tool_get_drawer (which now accepts a single ID or an array). Stored content stays verbatim — only the preview is truncated — so this respects the "Verbatim always" design principle in CLAUDE.md. The two-step flow also keeps agent context small when search is exploratory.

Why

Private tags let users mark thoughts they want remembered locally but never fed back to the model — useful for rough reasoning, personal notes, and conversation drafts. Progressive disclosure is the read-side complement: long drawer text no longer floods the agent response when it only needed to know which memory matched.

What changes

File Change
mempalace/privacy.py (new) redact_private(text) -> (cleaned, is_fully_private) and summarize_for_search(text, n_chars)
mempalace/mcp_server.py tool_search gains full=False default + schema entry; tool_get_drawer accepts str or list[str]; tool_add_drawer / tool_diary_write strip + reject fully-private content
mempalace/searcher.py Hits now include drawer_id pulled from chroma ids so the two-step fetch round-trips reliably
tests/test_privacy.py (new) 17 unit tests covering redaction edge cases + summary truncation
tests/test_mcp_server.py New TestProgressiveDisclosure class (20+ integration tests)
mempalace/instructions/search.md Documents the default summary response, full=true opt-out, and the recommended search → get_drawer flow
mempalace/instructions/help.md Adds a Privacy section + updates the write-side tool list

Compatibility notes

  • Default response shape change. tool_search no longer returns full drawer text by default. Callers that depended on full text can pass full=true to restore the old behavior. LLM callers typically benefit from the new default (smaller responses, clear drawer_id for follow-up).
  • No stored data is altered or re-embedded. Redaction only affects new writes.
  • tool_get_drawer(id=...) is backwards compatible (single ID still works); new array form is additive.

Example — private tag

tool_add_drawer(
    wing="brainstorm", room="ideas",
    content="Public idea: build a MCP logger.\n<private>Skeptical — might be over-engineering.</private>"
)
# Stores and embeds only:  "Public idea: build a MCP logger."

Example — progressive disclosure

# 1. broad search returns compact previews
results = tool_search(query="auth rework", limit=3)
#  [{"drawer_id": "abc123", "text": "Switched to JWT after the session-leak incident...", ...}, ...]

# 2. fetch only the hits you actually want verbatim
tool_get_drawer(id=["abc123", "def456"])

Test plan

  • pytest tests/test_privacy.py tests/test_searcher.py tests/test_mcp_server.py — 112 passed
  • ruff check — clean
  • ruff format --check on modified files — clean
  • <private> stripping is case-insensitive (<PRIVATE>, <Private> all work)
  • Fully-private entry rejected with success=False + reason field
  • tool_get_drawer accepts both id="abc" and id=["abc", "def"]
  • tool_search(full=true) returns the pre-existing full-text response shape

Notes for reviewers

  • This PR is independent from feat: add Weibull decay and rerank pipeline for search results #1032 (rerank pipeline). They both touch mempalace/mcp_server.py:tool_search schema but at different call sites; a trivial merge conflict is expected if both land.
  • The summary character budget (~30 chars backing up to the last word boundary) is hard-coded in privacy.py. Happy to make that configurable if reviewers prefer.

🤖 Generated with Claude Code

zackchiutw and others added 2 commits April 19, 2026 21:03
Two privacy-minded additions to the search and write paths:

1. ``<private>...</private>`` tag handling on write. ``tool_add_drawer``
   and ``tool_diary_write`` strip the marked blocks before sanitize +
   embed. Entries whose entire content is inside a private tag are
   refused outright so nothing leaks through metadata, drawer IDs, or
   the embedding itself.

2. Progressive disclosure on read. ``tool_search`` returns a ~30-char
   summary + ``drawer_id`` per hit by default; callers opt in to the
   full verbatim drawer text via ``full=true`` or fetch on demand via
   ``tool_get_drawer`` (now accepts a single ID or an array). Stored
   content stays verbatim — only the preview is truncated — so this
   does not violate the "verbatim always" principle.

Supporting changes:
- ``mempalace/privacy.py`` — ``redact_private`` + ``summarize_for_search``
- ``searcher.py`` — hits now carry ``drawer_id`` from chroma ids so the
  two-step flow can round-trip reliably
- Full test coverage in ``tests/test_privacy.py`` and new progressive
  disclosure test class in ``tests/test_mcp_server.py``

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the new search/write semantics so LLM callers discover the
two-step fetch naturally instead of re-requesting full results.

- search.md: describe the default summary + drawer_id response, the
  ``full=true`` opt-out, and the recommended search→get_drawer flow.
- help.md: add a Privacy section covering <private> tag stripping,
  the summary-only default, and the expanded ``mempalace_get_drawer``
  (single id or array) in the write-side tool list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Scanned all 233 open upstream PRs today against our open PRs and
fork-ahead / planned-work items. Findings merged into README:

- P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032
  (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage
  rerank pipeline). Older simpler version at MemPalace#337. Dropped as
  fork work; watching MemPalace#1032.
- P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743
  (@igorls) defines the plugin contract; four backend PRs already
  in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes,
  does not rebuild.
- P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033
  (@zackchiutw) ships adjacent privacy-tag + progressive disclosure
  but not the full multi-label scheme.
- Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman)
  which adds broader mine_global_lock() + HNSW num_threads pin.

Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.
@igorls igorls added enhancement New feature or request area/search Search and retrieval labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/search Search and retrieval enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants