feat(graph): cross-wing tunnels by shared topics (#1180) by igorls · Pull Request #1184 · MemPalace/mempalace

igorls · 2026-04-24T22:20:39Z

Summary

Implements #1180: when two wings share one or more confirmed TOPIC labels (the LLM-refine bucket from mempalace init --llm), the miner automatically drops a symmetric tunnel between them at mine time. This surfaces meaningful cross-wing connections via the palace graph for projects that share frameworks, vendors, or recurring concepts.

Design decisions for the tradeoffs called out in the issue

Signal source: TOPIC labels only. Manifest-dependency overlap is intentionally out of scope for v1 — it's harder signal with a narrower scope, and the LLM-refined topic set already captures the framework / vendor case the issue motivates. Worth a separate PR.
When to compute: mine-time, post-file-loop. Cheaper than search-time (graph changes infrequently) and avoids the freshness lag of a nightly job. Tunnel creation runs after every file in the current wing has been processed and degrades quietly on failure (warning printed, mine still succeeds).
Threshold: simple per-pair shared-topic count. Default 1 (any single shared topic is enough). Configurable via MEMPALACE_TOPIC_TUNNEL_MIN_COUNT env var or topic_tunnel_min_count in ~/.mempalace/config.json. Bump to 2+ for users whose projects share lots of common-tech labels (Python, Docker, Git) and only want meaningfully overlapping wings to link.
Persistence: extend ~/.mempalace/known_entities.json. Topics are persisted under a new topics_by_wing map keyed by wing name; the per-project entities.json gains a topics list as audit trail. The miner's SKIP_FILENAMES already excludes entities.json, so no churn there. Wing names in topics_by_wing are explicitly not surfaced through the flat known-name set (_load_known_entities) used by drawer-tagging.
User control: out of scope for v1. Per-topic allow/deny lists are a future idea — adding one wrong topic at worst creates a low-traffic tunnel and never alters drawer storage, so the cost of a bad classification is small.
Display: not in this PR. Tunnels being present in the graph is enough — they're retrievable via the existing list_tunnels / follow_tunnels API. Surfacing them in search results is a follow-up.

Behavior changes

mempalace init --llm: confirmed topics are now persisted (previously TOPIC classifications collapsed into the uncertain bucket and were dropped when the user accepted the defaults).
mempalace mine <project>: after the file loop, computes topic tunnels for the freshly-mined wing against every other wing recorded in topics_by_wing. Prints Topic tunnels: +N cross-wing link(s) when tunnels are created. Without --llm confirmed topics, this is a no-op.
palace_graph: gains compute_topic_tunnels(topics_by_wing, min_count, label_prefix) and topic_tunnels_for_wing(wing, ...). Both route through the existing create_tunnel API so storage, dedup, and the list_tunnels / follow_tunnels / delete_tunnel paths just work.

Test plan

tests/test_palace_graph_tunnels.py::TestTopicTunnels — 9 unit tests covering shared-topic link, threshold honored, threshold blocks single overlap, case-insensitive overlap, multi-topic per-pair links, three-wings pairwise expansion, topic_tunnels_for_wing only links the named wing, dedup on recompute, empty-input no-op.
tests/test_known_entities_registry.py — 7 new tests covering topics_by_wing persistence, replace-on-reinit, multi-wing coexistence, case-insensitive dedup, get_topics_by_wing reader, no-leak of wing names into the flat known-set, and the wing-less code path.
tests/test_miner.py — 3 end-to-end tests exercising the full mine-time flow: tunnel created when shared topic exists, no tunnel when threshold blocks the overlap, no tunnel when only one wing has topics.
tests/test_llm_refine.py — updated TOPIC routing test to assert the new topics bucket (separate from uncertain) and added an AMBIGUOUS assertion to lock in the still-uncertain behavior for that label.
Full suite passes locally: 1255 passed.
ruff check + ruff format --check clean against the CI-pinned ruff>=0.4.0,<0.5.
No real names, real paths, or private data introduced — all test fixtures use generic placeholders (wing_alpha, wing_beta, Alice, Bob, foo, bar, Angular, OpenAPI).

Copilot

Pull request overview

Adds automatic cross-wing “topic tunnels” so wings that share confirmed LLM-refined TOPIC labels become connected in the palace graph, enabling discovery of related content across projects.

Changes:

Introduces topic-tunnel computation in palace_graph and runs it at mine-time with a configurable overlap threshold.
Extends the entity/LLM-refine pipeline to preserve a dedicated topics bucket and persists per-wing topics in the known-entities registry.
Adds configuration knob + changelog entry and expands test coverage across graph, miner, registry, and LLM refine flows.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`mempalace/palace_graph.py`	Implements topic tunnel computation APIs built on `create_tunnel`.
`mempalace/miner.py`	Persists `topics_by_wing`, exposes `get_topics_by_wing`, and computes tunnels post-mine.
`mempalace/cli.py`	Persists topics on `init` and writes them into the registry for later mine-time tunnel creation.
`mempalace/llm_refine.py`	Routes `TOPIC` classifications into a dedicated `topics` bucket.
`mempalace/entity_detector.py`	Threads `topics` through confirmation output and display.
`mempalace/project_scanner.py`	Ensures discovered-entity dict shape always includes `topics` and merges it.
`mempalace/config.py`	Adds `topic_tunnel_min_count` config/env property.
`CHANGELOG.md`	Documents the new feature and configuration knobs.
`tests/*`	Adds/updates tests for new dict shape, topic routing, registry persistence, mine-time behavior, and graph tunnel creation/dedup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T22:25:33Z

+def _normalize_topic(name: str) -> str:
+    """Lowercase + strip topics for case-insensitive overlap detection."""
+    return str(name).strip().lower()
+
+
+def compute_topic_tunnels(
+    topics_by_wing: dict,
+    min_count: int = 1,
+    label_prefix: str = "shared topic",
+) -> list[dict]:
+    """Create tunnels for every pair of wings that share >= ``min_count`` topics.
+
+    Args:
+        topics_by_wing: ``{wing_name: [topic_name, ...]}`` mapping. Topic
+            names are compared case-insensitively; the first observed
+            casing is used for the tunnel room name.
+        min_count: minimum number of overlapping topics required to drop
+            any tunnel between a wing pair. ``1`` means a single shared
+            topic is enough; bumping to e.g. ``2`` requires multiple
+            overlaps and filters out coincidental single-topic links.
+        label_prefix: human-readable string prefixed to the tunnel label.
+
+    Returns:
+        List of tunnel dicts as returned by ``create_tunnel`` — one per
+        (wing_a, wing_b, topic) triple that crossed the threshold. A
+        wing-pair below ``min_count`` produces no tunnels at all (not
+        even for its single shared topic).
+
+    No-op semantics:
+      - empty/None ``topics_by_wing`` returns ``[]``.
+      - wings whose topic list is empty are skipped.
+      - ``min_count <= 0`` is clamped to 1.
+    """
+    if not topics_by_wing:
+        return []
+
+    min_count = max(1, int(min_count))
+
+    # Build a normalized-topic -> first-seen casing map per wing so we
+    # preserve display casing while still doing case-insensitive overlap.
+    wing_topics: dict[str, dict[str, str]] = {}
+    for wing, names in topics_by_wing.items():
+        if not isinstance(wing, str) or not wing.strip():
+            continue
+        if not isinstance(names, (list, tuple)):
+            continue
+        bucket: dict[str, str] = {}
+        for n in names:
+            if not isinstance(n, str):
+                continue
+            key = _normalize_topic(n)
+            if not key:
+                continue
+            bucket.setdefault(key, n.strip())
+        if bucket:
+            wing_topics[wing.strip()] = bucket
+
+    wings = sorted(wing_topics.keys())
+    created: list[dict] = []
+    for i, wa in enumerate(wings):
+        topics_a = wing_topics[wa]
+        for wb in wings[i + 1 :]:
+            topics_b = wing_topics[wb]
+            shared_keys = set(topics_a.keys()) & set(topics_b.keys())
+            if len(shared_keys) < min_count:
+                continue
+            # Stable sort for deterministic tunnel ordering across runs.
+            for key in sorted(shared_keys):
+                # Prefer the casing from whichever wing sorts first — both
+                # are valid; this just keeps the displayed room consistent.
+                room = topics_a[key] if topics_a[key] else topics_b[key]
+                tunnel = create_tunnel(
+                    source_wing=wa,
+                    source_room=room,
+                    target_wing=wb,
+                    target_room=room,
+                    label=f"{label_prefix}: {room}",
+                )


compute_topic_tunnels uses the raw topic string as room when calling create_tunnel. Topic values coming from LLM/user data can contain /, \\, .., or other characters; this is problematic because tunnel IDs are derived from _endpoint_key(f"{wing}/{room}") (so / in a topic can cause endpoint-key ambiguity/collisions) and MCP tools sanitize room via sanitize_name (so tunnels with punctuation/slashes may be impossible to follow via the API). Consider sanitizing/encoding topic names into a safe room identifier (and keeping the human-readable topic in the tunnel label), or hard-validating topics against the same sanitize_name constraints and skipping invalid ones with a warning.

Copilot · 2026-04-24T22:25:33Z

+            # Wing matches the default produced by ``room_detector_local``
+            # (folder basename) and the miner fallback in ``load_config``.
+            # Used by the topics_by_wing map so cross-wing tunnels can be
+            # computed at mine time.
+            wing = project_path.name
+            registry_path = add_to_known_entities(confirmed, wing=wing)


cmd_init stores topics under topics_by_wing using wing = project_path.name, but mempalace mine uses the wing from mempalace.yaml (or override). If a project has a configured wing that differs from the folder basename, topics will be recorded under the wrong key and mine-time tunnel computation will never see them. Consider deriving the wing the same way mine() does (e.g., read mempalace.yaml/mempal.yaml if present, falling back to basename) so init + mine agree on the wing key.

Copilot · 2026-04-24T22:25:33Z

+def test_mine_creates_topic_tunnels_for_shared_topics(tmp_path, monkeypatch):
+    """End-to-end: when two wings have already-confirmed topics that overlap,
+    the miner's mine-time pass drops a cross-wing tunnel between them.
+
+    Issue #1180.
+    """
+    from mempalace import miner, palace_graph
+
+    # Redirect both the registry and tunnel-storage paths into tmp_path
+    # so we never touch the developer's real ~/.mempalace directory.
+    registry = tmp_path / "known_entities.json"
+    monkeypatch.setattr(miner, "_ENTITY_REGISTRY_PATH", str(registry))
+    miner._ENTITY_REGISTRY_CACHE.update({"mtime": None, "names": frozenset(), "raw": {}})
+    tunnels_file = tmp_path / "tunnels.json"
+    monkeypatch.setattr(palace_graph, "_TUNNEL_FILE", str(tunnels_file))
+
+    # Pre-populate the registry as if init had been run for two wings that
+    # share a topic.
+    miner.add_to_known_entities({"topics": ["foo", "bar"]}, wing="wing_one")
+    miner.add_to_known_entities({"topics": ["foo", "baz"]}, wing="wing_two")
+
+    # Mine wing_two — should drop tunnels between wing_two and wing_one
+    # for every shared topic. Just one in this case.
+    project_root = tmp_path / "wing_two_project"
+    project_root.mkdir()
+    write_file(
+        project_root / "notes.md",
+        "Some prose long enough to make a chunk. " * 20,
+    )
+    with open(project_root / "mempalace.yaml", "w") as f:
+        yaml.dump({"wing": "wing_two", "rooms": [{"name": "general"}]}, f)
+
+    palace_path = tmp_path / "palace"
+    mine(str(project_root), str(palace_path))
+


These new end-to-end miner tests rely on the default topic_tunnel_min_count being 1, but _compute_topic_tunnels_for_wing reads MempalaceConfig() which can pick up a developer's real ~/.mempalace/config.json when running tests locally. That can make the tests flaky outside CI. Consider pinning the value in these tests (e.g., monkeypatch.setenv("MEMPALACE_TOPIC_TUNNEL_MIN_COUNT", "1") or monkeypatching HOME / using a temp config dir) so they don't depend on external user config.

bensig

Reviewed locally on `feat/cross-wing-topic-tunnels`. Approve, with one mechanical ask + one design question.

Full pytest on this branch: 1255 passed in 42s — +8 over #1183 baseline, mapping to the new tunnel + registry tests this PR adds.

Mechanical ask: rebase

The PR is showing CONFLICTING. Real conflict surface is small — one file:

`mempalace/config.py` — both this PR and #1185 added new properties to `MempalaceConfig` (your `embedding_device` from #1185 vs `topic_tunnel_min_count` here). Trivial resolution; both properties just need to coexist.

`tests/test_miner.py` and `mempalace/miner.py` both auto-merged cleanly when I tried locally, so the actual rebase is one config.py merge.

Design question (not a blocker)

Topic-name-as-room semantics. `compute_topic_tunnels` uses the topic itself (e.g. `"Angular"`) as the `room` field on both ends of the tunnel:

```python
tunnel = create_tunnel(
source_wing=wa, source_room=room, # room = topic name
target_wing=wb, target_room=room,
label=f"{label_prefix}: {room}",
)
```

If a project happens to have a literal `Angular` room from autodetected folder structure, the tunnel target collides with that real room. Is the intent that `follow_tunnels` / `list_tunnels` callers can transparently traverse from drawers in a real "Angular" room into topic-tunnel-linked wings, or are these meant as virtual graph-only connections? Either is fine — just want to confirm you considered the case where topic strings collide with structural room names.

If they're meant as virtual: a `label_prefix`-style namespace on the room field (e.g. `"topic:Angular"`) would make the distinction visible at `list_tunnels` read time. If they're meant to merge with literal rooms: the current shape is exactly right and ignore me.

Strengths

Out-of-scope decisions are explicit and defensible. Manifest-dep overlap deferred, per-topic allow/deny lists deferred, search-result surfacing deferred. Each labeled as future work with the rationale ("a wrong topic at worst creates a low-traffic tunnel and never alters drawer storage").
Mine-time, not search-time. Cheaper, no freshness lag, degrades quietly on failure (`try/except` prints warning, mine still succeeds with the right exit code).
Defensive type-checking everywhere. `isinstance` guards on wing names, topic lists, the `topics_by_wing` structure. Won't crash on malformed registry data — important because users can hand-edit `known_entities.json`.
Reuses existing tunnel storage. `create_tunnel` / `list_tunnels` / `follow_tunnels` / `delete_tunnel` paths all just work. No new graph code to maintain past this PR.
`topics_by_wing` correctly namespaced from flat known_entities. The `_refresh_known_entities_cache` change skips the wing-keyed map when building the flat name set — wing names don't leak into drawer-tagging. The inline comment makes the intent obvious to the next reader.
Replace-not-union semantics for topics. Re-running `init` reflects the user's latest confirmation rather than accumulating stale labels. Right call; users would otherwise get a one-way ratchet of topic counts.
Threshold knob. `topic_tunnel_min_count` defaults to 1, configurable via env or config. Users with lots of common-tech labels (Python, Docker, Git) can bump to 2+ for meaningful overlaps only. Documented in the property docstring.
Casing preserved from "first seen" while overlap detection is case-insensitive. Won't show jarring case mismatches in tunnel labels (`"angular"` from one wing meeting `"Angular"` from another renders consistently).
Test coverage is comprehensive. 9 tunnel tests + 7 registry tests + new test_miner cases. The cases I'd most want covered (dedup on recompute, threshold blocks single overlap, three-wing pairwise, case-insensitive overlap, empty-input no-op) are all there.

Minor (not blockers)

`_compute_topic_tunnels_for_wing` reads `MempalaceConfig()` every mine call. Cheap (file read with caching), but technically re-reads if mine is called multiple times in one session. Negligible.
The mine-time tunnel print is a single line, only fires when tunnels were actually created. Quiet by default — good.

Once rebased, ship it. The design question above is genuinely a question, not a blocker — your answer determines whether to refactor the room field, but either answer is defensible.

When two wings have one or more confirmed TOPIC labels in common, the miner now drops a symmetric tunnel between them at mine time so the palace graph reflects shared themes (frameworks, vendors, recurring concepts). - llm_refine: TOPIC label routes to a dedicated `topics` bucket so the signal survives confirmation instead of getting collapsed into `uncertain` and dropped. - entity_detector / project_scanner: bucket plumbed through the detection pipeline; `confirm_entities` returns confirmed topics alongside people/projects. - miner.add_to_known_entities: optional `wing` parameter records the confirmed topics under `topics_by_wing` in `~/.mempalace/known_entities.json`. Wing names do NOT leak into the flat known-name set used by drawer-tagging. - palace_graph: `compute_topic_tunnels` and `topic_tunnels_for_wing` create symmetric tunnels via the existing `create_tunnel` API so they share dedup and persistence with explicit tunnels. - miner.mine: post-file-loop pass calls `topic_tunnels_for_wing` for the freshly-mined wing. Failures are logged but never abort the mine. - config: `topic_tunnel_min_count` knob (env `MEMPALACE_TOPIC_TUNNEL_MIN_COUNT` or `~/.mempalace/config.json`), default 1. Tests cover topic persistence through init->mine, tunnel creation when wings share a topic, no tunnel below threshold, cross-wing tunnel retrieval via `list_tunnels`, dedup on recompute, case-insensitive overlap, and the end-to-end mine-time wiring. Out of scope for this PR (called out in the PR body): manifest- dependency overlap, per-topic allow/deny lists, search-result surfacing.

… field Previously a cross-wing topic tunnel for "Angular" stored the room as "Angular" — colliding with a wing's literal folder-derived "Angular" room at follow_tunnels/list_tunnels read time, and exposing raw topic strings (which may contain characters rejected by sanitize_name) to the MCP surface. Topic tunnels now store their room as "topic:<original-casing>" and carry kind="topic" on the stored dict. Explicit tunnels get kind="explicit" (default). follow_tunnels("wing", "Angular") on a literal Angular room no longer surfaces topic connections for the same name, and any LLM scanning list_tunnels has a visible discriminator.

igorls · 2026-04-25T02:10:19Z

@bensig went with the namespaced topic: room + kind field (option A in your review); see 865a36b.

…HNSW fixes Bring in 29 commits from upstream/develop since the last merge (2026-04-23): Major absorbed changes: - MemPalace#976 (Felipe Truman): HNSW graph corruption fix, mine_global_lock for fan-out, MAX_PRECOMPACT_BLOCK_ATTEMPTS=2 for /compact deadlocks. Closes MemPalace#974/MemPalace#965/MemPalace#955; likely resolves MemPalace#1172 too. - MemPalace#1179 (Igor): CLI mempalace search routes through _hybrid_rank, legacy-metric warning + _warn_if_legacy_metric, invariant tests on hnsw:space=cosine across all 5 collection-creation paths. - MemPalace#1180/MemPalace#1183/MemPalace#1184: cross-wing topic tunnels, init mine UX, --auto-mine. - MemPalace#1185 (perf/batched-upsert-gpu): batched ChromaDB upserts, GPU device detection via mempalace/embedding.py. - MemPalace#1182: graceful Ctrl-C during mempalace mine. - MemPalace#1168: tunnel permissions security fix. Conflict resolutions (8 files): - searcher.py: kept fork's CLI delegation through search_memories (cleaner single-source-of-truth path); upstream's inline-retrieval branch dropped. Merged upstream's print formatting (cosine= + bm25=) with fork's matched_via reporting from sqlite BM25 fallback. - backends/chroma.py: kept fork's _BLOB_FIX_MARKER + palace_path arg to ChromaCollection (MemPalace#1171 write lock); merged upstream's **ef_kwargs (embedding_function support from MemPalace#1185). Removed duplicate _pin_hnsw_threads (fork already cherry-picked Felipe's earlier). - mcp_server.py: kept fork's palace_path arg + upstream's clearer comment on hnsw:num_threads=1 rationale. - miner.py: took upstream's serial mine() flow (mine_global_lock + Ctrl-C), RESTORED fork's strict detect_room — substring matching from upstream breaks fork-only test_detect_room_priority1_no_substring_match. Added `import re` for word-boundary regex matching. Fork-ahead concurrent mining (workers=, ThreadPoolExecutor from 5cd14bd) is dropped — daemon migration deprioritizes local concurrent mining; can re-port if needed. - CHANGELOG.md: combined fork's segfault-trio narrative with upstream's v3.3.4 release notes. - HOOKS_TUTORIAL.md: took MEMPALACE_PYTHON env var name (fork README was stale; hooks already use this name per fork-ahead MemPalace#19). - test_convo_miner_unit.py: took both contextlib + pytest imports. - test_searcher.py: imported upstream's 3 new TestSearchCLI tests but skipped them with TODOs — they assume upstream's inline-retrieval CLI with simpler mocks. Rewrite needed for fork's delegated search_memories path (sqlite BM25 fallback + scope counting). Test result: 1334 passed, 3 skipped (the upstream tests above), 3 warnings. Implications for open fork PRs: - MemPalace#1171 (cross-process write lock at adapter) becomes structurally redundant given MemPalace#976's mine_global_lock + daemon-strict serialization. Slated for close with thank-you to Felipe. - MemPalace#1173/MemPalace#1177 still relevant (quarantine threshold + blob_seq marker). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 24, 2026 22:20

igorls requested review from bensig and milla-jovovich as code owners April 24, 2026 22:20

Copilot started reviewing on behalf of igorls April 24, 2026 22:21 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

bensig approved these changes Apr 25, 2026

View reviewed changes

igorls added 2 commits April 24, 2026 23:06

igorls force-pushed the feat/cross-wing-topic-tunnels branch from 809ae61 to 865a36b Compare April 25, 2026 02:07

igorls merged commit 8d49b00 into develop Apr 25, 2026
6 checks passed

igorls deleted the feat/cross-wing-topic-tunnels branch April 25, 2026 02:26

bensig mentioned this pull request Apr 25, 2026

Topic tunnels silently skipped for wings with hyphenated dir names #1194

Closed

igorls mentioned this pull request Apr 27, 2026

fix(tunnels): normalize wing names in topic tunnel lookup for hyphenated dirs #1197

Merged

claudiusthebot mentioned this pull request May 1, 2026

docs(mempalace): document tunnel + drawer + maintenance tools, note 3.3.4 auto-tunnels dylanneve1/talon#100

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graph): cross-wing tunnels by shared topics (#1180)#1184

feat(graph): cross-wing tunnels by shared topics (#1180)#1184
igorls merged 2 commits intodevelopfrom
feat/cross-wing-topic-tunnels

igorls commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

Copilot AI Apr 24, 2026

Uh oh!

bensig left a comment

Uh oh!

igorls commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

igorls commented Apr 24, 2026

Summary

Design decisions for the tradeoffs called out in the issue

Behavior changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

bensig left a comment

Choose a reason for hiding this comment

Mechanical ask: rebase

Design question (not a blocker)

Strengths

Minor (not blockers)

Uh oh!

igorls commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants