feat(init): optional LLM-assisted entity refinement + Claude Code convo scanner (phase 2)#1150
Conversation
Claude Code stores sessions under `~/.claude/projects/<slug>/<id>.jsonl` where `<slug>` is the original CWD with `/` replaced by `-`. That encoding is lossy — can't distinguish `foo-bar` (one segment) from `foo/bar` (two) — so slug-decoding alone produces wrong names for any hyphenated project. Fortunately, every message record carries a `cwd` field with the true path. This scanner reads one record per session to recover the accurate project name deterministically, falling back to slug-decoding only if the JSONL is malformed or empty. Output shape matches project_scanner.ProjectInfo so the discover orchestrator can union results across sources. Session count doubles as a density signal for ranking. 22 unit tests cover: root detection, cwd extraction with malformed input tolerance, fallback slug decoding, name resolution using the newest session (so renames win), and dedup when two encoded dirs resolve to the same project.
Three providers cover the useful space while keeping the zero-API default: - `ollama` (default): local models via http://localhost:11434. Works fully offline. Tag-matching check accepts both `model` and `model:latest` forms. - `openai-compat`: any /v1/chat/completions endpoint. Covers OpenRouter, LM Studio, llama.cpp server, vLLM, Groq, Together, Fireworks, and most self-hosted frameworks. API key falls back to $OPENAI_API_KEY. Endpoint normalization is forgiving about trailing `/v1`. - `anthropic`: Messages API v2023-06-01. API key falls back to $ANTHROPIC_API_KEY. Concatenates multi-block text responses. JSON mode is normalized across providers — Ollama uses `format: "json"`, OpenAI-compat uses `response_format`, Anthropic uses prompt-level instruction. Callers request JSON once; this module handles the provider-specific plumbing. No external SDK dependency; stdlib `urllib` throughout. HTTP errors are wrapped into a single `LLMError` class so callers don't need to distinguish transport, auth, and parse failures at the call site. 26 tests, all with mocked HTTP — suite runs offline with no real provider required.
Takes the candidate set produced by phase-1 detection (manifests, git authors, regex on prose) and asks an LLM to reclassify each candidate as PERSON / PROJECT / TOPIC / COMMON_WORD / AMBIGUOUS. Scale approach: never feed the raw corpus to the LLM. For each candidate, collect up to 3 context lines from sampled prose, cap each at 240 chars, batch 25 candidates per call. Keeps total input around 50-100K tokens even on large corpora and completes in a few minutes on a 4B local model. Interactive UX: - Stderr progress bar with the current candidate name, updates per-batch. - Ctrl-C interrupts cleanly: returns a RefineResult with `cancelled=True` and whatever was classified before the interrupt. The partial result is safe to pass straight to confirm_entities. - Per-batch errors (transport, parse) are recorded in `errors` and don't abort the whole run. Refinement scope: only `uncertain` and low-confidence `projects` entries are sent. Manifest-backed projects (conf >= 0.95) and git- authored people are already authoritative and skip the LLM. Response parser is defensive — accepts `label` or `type` keys, lowercase/uppercase variants, top-level list or wrapped object, and strips markdown code fences. Unknown labels become AMBIGUOUS so the user reviews them rather than silently accepting a bad classification. `collect_corpus_text` provides a simple stratified prose sampler (recent first, capped per-file) so callers don't need to build their own corpus window. 28 tests with a FakeProvider (no network). Covers context collection, prompt building, response parsing variants, classification apply, end-to-end refine, and Ctrl-C partial-result behavior.
Extends the init orchestrator to consume two new signal sources:
1. Claude Code conversation dirs: when the target is a
`~/.claude/projects/` root, convo_scanner contributes ProjectInfo
entries alongside the git/manifest projects. Dedup is by name,
preferring the entry with more user-authored activity.
2. Optional LLM refinement: when --llm is passed, discover_entities
constructs the provider, validates availability, and runs
llm_refine.refine_entities on the merged candidates. Status
summary (reclassified / dropped / cancelled / batch errors)
prints to stderr.
New init flags (opt-in, default remains zero-API):
- --llm: enable refinement
- --llm-provider: ollama (default) | openai-compat | anthropic
- --llm-model: default gemma4:e4b for Ollama
- --llm-endpoint: URL (required for openai-compat)
- --llm-api-key: falls back to env ($ANTHROPIC_API_KEY or
$OPENAI_API_KEY depending on provider)
Provider check_available runs before the scan, so the user sees an
immediate error ("Run: ollama pull <model>" or "ANTHROPIC_API_KEY not
set") rather than a mid-scan failure.
There was a problem hiding this comment.
Pull request overview
Adds an opt-in, provider-pluggable LLM refinement step to improve mempalace init entity classification for prose-heavy corpora, and introduces a deterministic Claude Code conversation scanner to extract project names from ~/.claude/projects/ sessions (using per-session cwd metadata rather than lossy slug decoding).
Changes:
- Add
mempalace init --llmwith provider/model/endpoint/api-key flags and provider availability checks. - Introduce
mempalace/llm_client.py(Ollama, OpenAI-compatible, Anthropic) andmempalace/llm_refine.py(batching, context sampling, robust parsing, Ctrl‑C partial results). - Add
mempalace/convo_scanner.pyand wire it intodiscover_entities()for.claude/projects/roots; expand unit tests to cover new modules.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds tomli marker dependency for Python < 3.11. |
| mempalace/cli.py | Wires new --llm* flags into init and constructs/checks the provider before scanning. |
| mempalace/project_scanner.py | Extends discover_entities() to optionally scan Claude projects roots and run LLM refinement. |
| mempalace/convo_scanner.py | New scanner for ~/.claude/projects/ sessions, extracting project names from JSONL cwd. |
| mempalace/llm_client.py | New minimal HTTP-only provider abstraction (ollama/openai-compat/anthropic) + availability probes. |
| mempalace/llm_refine.py | New refinement pass: batching, context collection, response parsing, merge logic, and corpus sampling. |
| tests/test_llm_client.py | Unit tests for provider factory, HTTP wrapper, and provider request/response handling. |
| tests/test_llm_refine.py | Unit tests for prompt/context building, response parsing variants, merging, batching, and Ctrl‑C partials. |
| tests/test_convo_scanner.py | Unit tests for Claude projects root detection, cwd extraction, slug fallback, and dedup/ranking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sessions = sorted( | ||
| (p for p in project_dir.iterdir() if p.is_file() and p.suffix == ".jsonl"), | ||
| key=lambda p: p.stat().st_mtime, | ||
| reverse=True, # newest first — most likely to be well-formed | ||
| ) |
There was a problem hiding this comment.
_resolve_project_name() sorts session files by p.stat().st_mtime without handling OSError. If a .jsonl file is unreadable, broken, or permission-restricted, this will raise during sorting and can abort scanning the entire Claude projects tree. Consider wrapping stat() in a safe helper (e.g., defaulting mtime to 0 on error) or filtering out paths that fail stat before sorting.
…oritative sources Addresses issues found while reviewing the initial phase-2 implementation against real data: **Bug: uncertain bucket starved from the LLM.** `discover_entities` was dropping the regex-uncertain bucket whenever real git/manifest signal existed — which is exactly when `--llm` is most useful for cleaning up prose noise. The uncertain candidates never reached the refinement step. Fixed: only drop when `llm_provider is None`. **Context collection: word boundaries, not substring.** `_collect_contexts` used substring matching on lower-cased lines, so the name "Go" matched "good", "going", "forgot". Switched to a `(?<!\w)…(?!\w)` regex so short names only match at token boundaries. **Authoritative-source detection replaces confidence threshold.** Previously the refinement step skipped entries with `confidence >= 0.95` to avoid second-guessing manifest-backed projects. That threshold was fragile — the regex detector produces 0.99 confidence for things like `code file reference (5x)` on framework names (OpenAPI, etc.), so those skipped the LLM despite being regex-only noise. New helpers `_is_authoritative_person` / `_is_authoritative_project` look at the actual signal strings (commits, package.json, etc.) to decide. **Now also refines regex-derived people.** After #1148's high-pronoun-signal fix, the regex detector can promote non-people to the `people` bucket (e.g. a capitalized common noun that happened to appear near pronouns). The LLM now gets a chance to clean those up, while git-authored people are still skipped. **Robust JSON extraction.** Small local models routinely wrap JSON output in prose ("Sure, here's the classification: {…}"). The previous code-fence stripper failed on that. `_extract_json_candidates` now does balanced-bracket extraction with string-aware quote handling, so it recovers JSON from: - raw responses - markdown fenced blocks - JSON embedded inside surrounding text - multiple candidate objects/arrays **Prompt guidance for frameworks vs user projects.** Added an explicit instruction: frameworks, runtimes, APIs, cloud services, and third-party vendors (Angular, OpenAPI, Terraform, Bun, Google, etc.) are TOPIC unless the context clearly says it's the user's own codebase. Directly addresses a false-positive pattern observed during dev runs. **Defensive mtime.** `convo_scanner._safe_mtime` catches OSError during `stat()` — permission changes, filesystem races, broken symlinks — and sorts the affected file to the end of the newest-first order rather than crashing the scan. **Cosmetic:** merged two adjacent f-strings on the same line in `backends/chroma.py` and `llm_client.py` (no behaviour change). 15 new tests cover the OSError fallback, word-boundary matching, JSON extraction variants, authoritative-source helpers, refining high- confidence regex projects, and end-to-end LLM refinement preserving the uncertain bucket.
…to develop MemPalace#1148, MemPalace#1150, and MemPalace#1157 were reviewed and merged on GitHub, but the two stacked children landed on their parent feature branches (now stale) rather than on develop. Only MemPalace#1148's commits reached develop via the direct merge. Release PR MemPalace#1159 (develop → main for v3.3.3) is therefore missing the LLM refinement, Claude-conversation scanner, and miner- registry wire-up that were ostensibly part of the release. This merge brings the stale `feat/llm-entity-refine` branch (which contains the rolled-up merge commit for MemPalace#1157 → MemPalace#1150 → everything below) into develop so the release tag includes it. No code changes here — only history recovery.
Adds entries to the 3.3.3 section for the work that landed via MemPalace#1148, MemPalace#1150, MemPalace#1157, and MemPalace#1175 (rescued from stacked feature branches into develop via MemPalace#1175). Without these entries the 3.3.3 release notes on main would advertise only the hook/diary/search fixes that made it to develop through the first direct merge. Covers: - Manifest + git-author entity detection (MemPalace#1148) - Regex detector accuracy improvements (MemPalace#1148) - Optional --llm classification with Ollama / openai-compat / Anthropic provider abstraction and interactive UX (MemPalace#1150) - Claude Code conversation scanner (MemPalace#1150) - Init → miner registry wire-up so confirmed entities actually reach drawer metadata tagging (MemPalace#1157) - Case-insensitive project dedup across all sources (MemPalace#1175) - `mempalace mine` skips the generated entities.json artifact
Summary
Implements #1149 — phase-2 of the init entity detection work, stacked on top of #1148.
Adds an opt-in LLM refinement step (
mempalace init --llm) that takes the candidate set produced by phase-1 detection and reclassifies each entity as PERSON / PROJECT / TOPIC / COMMON_WORD / AMBIGUOUS. Default behaviour is unchanged — no LLM, no network, no API keys required.Also adds a deterministic
convo_scannerthat parses~/.claude/projects/session directories into project entities by reading each session'scwdmetadata, avoiding the lossy slug-decoding problem.Why stacked on #1148
The two PRs build incrementally:
Reviewing them together is easier than reviewing #1149 in isolation. Merge order: #1148 first, then this one rebases onto
develop.What's in this PR
mempalace/convo_scanner.py— Parse Claude Code conversation directories intoProjectInfo.cwdfrom session JSONL records for accurate project names (slug decoding is lossy — can't distinguishfoo-barone-segment fromfoo/bartwo-segments).mempalace/llm_client.py— Pluggable provider abstraction, no external SDKs (stdliburllibonly).ollama(default local, zero-API).openai-compatfor OpenRouter, LM Studio, llama.cpp server, vLLM, Groq, Together, Fireworks.anthropicfor the Messages API.check_available()probe before first use.mempalace/llm_refine.py— The refinement step.confirm_entities.label/typekeys, case variants, top-level list vs wrapped object, markdown code fences. Unknown labels become AMBIGUOUS so the user reviews them.CLI flags (opt-in, default zero-API):
API keys fall back to
$OPENAI_API_KEY/$ANTHROPIC_API_KEYwhen not passed explicitly.Tests
76 new unit tests across the three new modules, all running offline with mocked HTTP and a FakeProvider. Covers:
Full suite: all existing tests still pass; ruff clean.
Known limits / future work
Test plan
uv run pytest tests/ --ignore=tests/benchmarks— full suite passesruff check mempalace/ tests/— cleanruff format --check mempalace/ tests/— cleanmempalace init --helpshows all--llm-*options