feat(cli): mempalace mined + purge --source-file#7
Merged
Conversation
Adds two ergonomic gaps in the mining-management surface JP asked for: 1. **mempalace purge --source-file <path>** — extends the existing purge command with a third filter alongside --wing and --room. Single-filter or combined ($and) usage works the same way wing/room do; the filter accepts a metadata.source_file exact-match. End-to- end test exercises a real palace and confirms only the matching drawers are deleted, leaving siblings intact. 2. **mempalace mined** — companion to `mempalace status`, but groups by wing × source_file rather than wing × room. Answers "which files have I mined into this wing?" so an operator can pick targets for the new --source-file purge. Skips drawers without a source_file metadata key (diary entries, kg drawers, etc.). Honors --wing and --limit (default 50 sources per wing, --limit 0 to show all). Uses the same paginated col.get() pattern as miner.status() to handle 100K+ drawer palaces without tripping sqlite's max-variable limit. Together these close the "removing manually mined data" half of JP's ask. The "automining when files change" half is queued for a separate file-watcher daemon PR. Pre-existing `mempalace mine <dir>` already covers "adding" (and re-mines on mtime change via bulk_check_mined dedup), so no command needed for refresh. Tests: 8 new test cases (3 for purge --source-file with mock + real palace; 3 for cmd_mined including end-to-end groupBy + wing filter + no-palace error path; 2 updated to add the new arg to the test namespace builder). Full suite: 1564 passed, 1 skipped, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four findings from Copilot's review pass on the mining-management CLI: 1. **--wing filter pushed into the query** (Copilot cli.py:800). Previously cmd_mined fetched every drawer in the palace and filtered in Python, doing O(total) work even for "mempalace mined --wing X" on a 150K-drawer palace where X has 12K. Pre-fetch IDs via col.get(where=...) to size the loop, then pass where= on each batch get(). On the canonical palace, --wing filters now scan ~10x less data. 2. **Reject negative --limit at parse time** (Copilot cli.py:818). argparse type=int silently accepted -1, producing nonsensical "... -2 more" output. Add a typed _nonneg_int validator that raises ArgumentTypeError on negatives. argparse turns that into a clean exit code 2 with usage error message. 3. **Update module __doc__** (Copilot cli.py:1495). main() uses the module docstring as the help epilog. The hard-coded command list stopped at "status" — added "mined" and "purge --source-file" so `mempalace --help` reflects the new commands. 4. **Dispatch test for `mempalace mined`** (Copilot cli.py:1542). Sibling commands have main()-level dispatch tests; mined didn't. A typo or registration regression would have slipped through unit coverage. Added test_main_mined_dispatches and a paired test_main_mined_rejects_negative_limit (verifies #2). Tests: 68 cli tests pass; full suite green at 1566 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the MemPalace CLI with operator tooling to (1) list mined sources grouped by wing and (2) purge drawers based on the source_file metadata, closing the “remove manually mined data” half of the mining-management request.
Changes:
- Add
mempalace purge --source-file <path>and update purge filtering/error messaging to support wing/room/source-file criteria. - Add
mempalace minedto aggregate and display minedsource_fileentries grouped by wing (with--wingand--limitsupport, including argparse-time rejection of negative limits). - Add/extend CLI tests covering the new purge filter, mined output behavior, mined dispatch, and negative
--limitparsing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
mempalace/cli.py |
Implements cmd_mined, extends cmd_purge with --source-file, updates CLI help and dispatch, adds non-negative --limit parsing. |
tests/test_cli.py |
Adds coverage for purge-by-source-file, mined grouping/filtering, mined dispatch, and argparse rejection of negative --limit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+787
to
+816
| where = {"wing": args.wing} if args.wing else None | ||
| if where is not None: | ||
| try: | ||
| scope = col.get(where=where, include=[]) | ||
| scope_ids = scope.get("ids") if isinstance(scope, dict) else getattr(scope, "ids", []) | ||
| total = len(scope_ids or []) | ||
| except Exception: | ||
| total = col.count() | ||
| else: | ||
| total = col.count() | ||
| wing_sources: dict = defaultdict(lambda: defaultdict(int)) | ||
| batch_size = 5000 | ||
| offset = 0 | ||
| while offset < total: | ||
| kwargs = {"limit": batch_size, "offset": offset, "include": ["metadatas"]} | ||
| if where is not None: | ||
| kwargs["where"] = where | ||
| r = col.get(**kwargs) | ||
| batch = r.get("metadatas") if isinstance(r, dict) else getattr(r, "metadatas", []) | ||
| if not batch: | ||
| break | ||
| for m in batch: | ||
| m = m or {} | ||
| src = m.get("source_file") | ||
| if not src: | ||
| continue | ||
| wing = m.get("wing", "?") | ||
| wing_sources[wing][src] += 1 | ||
| offset += len(batch) | ||
|
|
jphein
added a commit
that referenced
this pull request
May 6, 2026
Four findings, batched into one cleanup PR. PR #6 — hooks_cli.hook_precompact docstring lie. Said "mine the transcript synchronously" but the local-mode _ingest_transcript uses subprocess.Popen (background) and daemon-strict mode is fire-and-forget HTTP. Rewrite docstring to describe the actual two-step flow: (1) transcript ingest (best-effort, may still be running when compaction starts), (2) _mine_sync of MEMPAL_DIR (real synchronous subprocess.run). PR #7 — cmd_mined unpaginated col.get + silent truncation. Old flow called col.get(where=..., include=[]) once to size the loop, then paginated with the same where clause. ChromaDB's get() has a silent ~10K id truncation, so the upfront sizing call would undercount on a wing >10K drawers. Drop the upfront sizing entirely; loop on empty-batch instead. Plus an early-exit when a batch is shorter than batch_size (saves one trailing empty get on exact-multiple wing counts). PR #8 — cmd_repair docstring stale. Said "rebuild palace vector index" but the function still dispatches max-seq-id mode. Rewrite to describe both modes + acknowledge the deleted reorganize mode. PR #10 — non-Projects dashed-project divergence. Documented limitation. Tried adding -dev-/-code-/etc. markers; reverted because those are also ambiguous (`~/dev/<project>` and `~/dev/<parent>/<project>` encode identically — a -dev-<rest> match would catch parent+project, breaking the existing test_wing_from_transcript_path_non_projects_layout test that expects the last-token fallback for `-dev-MemPalace-mempalace` → `mempalace`). Instead document the limitation in the docstring and add a regression test pinning the current behavior so future "fixes" don't break the layouts that DO want the fallback. Tests: +1 (regression test for the documented limitation). Full suite: 1552 passed, 1 skipped, ruff clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reopened after parent #2 squash-merge closed the original PR (#4).
Closes the 'removing manually mined data' half of JP's mining-management ask. Adds:
mempalace purge --source-file <path>(extends existing purge with a third filter)mempalace mined(lists mined source files grouped by wing × source_file)Addresses 4 Copilot findings from original PR #4:
whereclause (no full-collection scan)--limitat parse time__doc__epilog updated with new commandsmempalace minedOriginal PR: #4 (closed by parent merge)
Tests: 1566 pass, ruff clean.
🤖 Generated with Claude Code