fix(palace_graph): skip None metadata in build_graph#1201
fix(palace_graph): skip None metadata in build_graph#1201igorls merged 1 commit intoMemPalace:developfrom
Conversation
ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
palace_graph.build_graph None-metadata guard fix (commit 5fd15db). Caught by palace-daemon's verify-routes.sh on 2026-04-25; /stats was 500-ing on a single legacy drawer. Filed upstream as PR MemPalace#1201 — same lineage as MemPalace#999 (None-metadata audit) and MemPalace#1094 (None-coercion at backend boundary). Updates the row count: 14 merged, 8 open (added MemPalace#1201), 10 closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a production crash in palace_graph.build_graph when ChromaDB returns None entries in metadatas (legacy/partial-write drawers), allowing graph-based endpoints like /stats to remain available even with bad rows.
Changes:
- Skip
Nonemetadata entries duringbuild_graphiteration to avoidAttributeError. - Add a regression test ensuring
build_graphcompletes and counts valid drawers correctly whenmetadatascontainsNone.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| mempalace/palace_graph.py | Adds a defensive None check while iterating drawer metadatas to prevent crashes. |
| tests/test_palace_graph.py | Adds regression coverage for mixed {dict, None, dict} metadata batches. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ases A–C) Implements the structural fix promoted by the 2026-04-25 Cat 9 A/B (632 vs 3 tokens per query): Stop-hook auto-save checkpoint diary entries no longer share an index with content drawers. Phase A — scaffolding (no behavior change): - _SESSION_RECOVERY_COLLECTION constant + get_session_recovery_collection() in palace.py, mirroring get_collection's shape (cosine + thread-pin). - New tests/test_session_recovery.py: 3 tests covering constant, metadata, multi-collection coexistence on one palace. Phase B — write routing (the behavior change): - tool_diary_write routes topic in _CHECKPOINT_TOPICS to the recovery collection; everything else stays in the main collection. - _get_session_recovery_collection() in mcp_server.py with parallel cache. - conftest._reset_mcp_cache resets the new cache between tests. - 4 tests in TestCheckpointRouting — checkpoint→recovery, auto-save→recovery, general→main, search regression (checkpoints structurally invisible to mempalace_search now, not merely post-filtered). Phase C — new read path: - tool_session_recovery_read reads from the recovery collection only, with optional filters: session_id, agent, since, until, wing, limit. - session_id added as optional metadata field on tool_diary_write so recovery reads can filter by Claude Code session. - Defensive None-metadata handling per the MemPalace#999 / MemPalace#1094 / MemPalace#1201 family. - Registered in TOOLS dict + documented in website/reference/mcp-tools.md (test_no_undocumented_tools regression caught the missing doc; fixed). - 5 tests in TestSessionRecoveryRead — empty case, session_id filter, agent filter, limit, None-metadata defense. Phases D (migration of existing checkpoints out of main collection) and E (palace-daemon startup integration) explicitly deferred — multi-day work, gated on a separate go-ahead. Full suite 1360/1360 pass; 106 benchmark/stress deselected per default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bensig
left a comment
There was a problem hiding this comment.
Approve. Same shape as #1198 — defensive skip of `None` with a real reproduction case (palace-daemon's verify-routes.sh against the canonical 151K palace, 2026-04-25).
What landed
`build_graph` now skips `if meta is None: continue` rather than crashing on `AttributeError`. The comment is the right shape — names the symptom (`/stats` 500-ing, taking out every consumer of `build_graph`), points at the upstream issue family (#1020 / #999 / #1094), and dates the catch.
Tests
`test_none_metadata_does_not_crash` exercises a 3-drawer collection where the middle drawer has `None` metadata and asserts both real drawers still get processed. Specifically asserts `nodes["auth"]["count"] == 2` so a regression that silently drops both real drawers along with the None one would catch. `pytest tests/test_palace_graph.py` — 24 passed on this branch.
Observation, not actionable
This is the third `None` defense in three review sessions (this one + #1198 BM25 tokenize + the diary_read `wing=""` from #1147). The pattern keeps recurring because Chroma's columnar return shape doesn't carry typed nulls, and downstream code keeps assuming non-None where the data layer doesn't promise it. Worth a follow-up at some point: a `@dataclass` adapter at the `ChromaCollection` boundary that fills None metadata with `{}` and None documents with `""` so consumers don't each need their own defensive layer. Out of scope for this PR.
Ship it.
…#1173 status CLAUDE.md row 28 documents the canonical-source pipeline (5a01aec) that landed earlier today. PR table updated to reflect bensig's 2026-04-26 approvals on four of our open PRs: - MemPalace#1173 (HNSW quarantine wire): approved on the original 1-commit shape, then force-pushed two safety commits (cold-start gate + integrity sniff-test) after a production cold-start incident destroyed three healthy 253MB segments. Now mergeable=CONFLICTING against develop (which moved with MemPalace#1210, MemPalace#1205, MemPalace#976) — needs rebase + re-review. - MemPalace#1177, MemPalace#1198, MemPalace#1201: approved, mergeable, awaiting merge. YAML manifest gets a new entry for the doc pipeline itself; FORK_CHANGELOG regenerated to match. promises.md (in claude-config, not this repo) got a long log entry covering today's full output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @bensig — ready for merge whenever convenient. Tests stable on the fork's daily-driver palace + smoke pass on the develop branch. |
…rry-pick of MemPalace#1094) Fork main was carrying the per-site `meta = meta or {}` guards from MemPalace#999 in eight read paths but didn't have the boundary coercion that closes the issue once for all callers. Cherry-picked MemPalace#1094 (open upstream, jp-authored) so the typed QueryResult/GetResult contract — both declare `metadatas: list[dict]`, never `list[Optional[dict]]` — is honored at the chromadb adapter boundary. Three same-family fork-ahead PRs we filed since (MemPalace#1198 _tokenize None-doc, MemPalace#1201 palace_graph None metadata, MemPalace#1199 review) all pointed at gaps that would have been impossible if this pattern had been in place. Future None-metadata bugs of the same shape are now catchable at one site instead of N. Per-site guards are kept as belt-and-suspenders for paths that might bypass the typed wrappers (e.g. legacy `import chromadb` direct calls). A future cleanup PR can remove them once we're confident no caller reaches chromadb directly. 6 new tests in test_backends.py (MemPalace#1094 originals); composes cleanly with fork main's _segment_appears_healthy + _quarantined_paths from earlier today. Full suite 1383/1383. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in upstream's corpus-origin + privacy-warning track (PRs MemPalace#1211 MemPalace#1221 MemPalace#1223 MemPalace#1224 MemPalace#1225) plus the canonical merged versions of our four PRs that landed today (21:22-21:41 UTC): MemPalace#1173 quarantine_stale_hnsw on make_client + cold-start gate + integrity sniff-test (PROTO/STOP byte check, no deserialization) MemPalace#1177 .blob_seq_ids_migrated marker guard, closes MemPalace#1090 MemPalace#1198 _tokenize None-document guard in BM25 reranker MemPalace#1201 palace_graph.build_graph skips None metadata Conflict resolution: * mempalace/backends/chroma.py — took upstream as base (it has the igorls-review pickle-protocol docstring, thread-safety paragraph, and Path(marker).touch() style nit), then re-applied MemPalace#1094's _coerce_none_metas in query()/get() since MemPalace#1094 is still open and not yet in develop. * mempalace/mcp_server.py — took upstream's clean form. Dropped the fork-only `palace_path=` kwarg from four ChromaCollection() call sites: the kwarg was load-bearing for MemPalace#1171's per-collection write lock, but MemPalace#1171 closed in favor of MemPalace#976's mine_global_lock + daemon-strict, so the kwarg has no remaining consumer. ChromaCollection.__init__ in upstream/develop is back to (self, collection); calling it with palace_path= raised TypeError → silently swallowed by the broad except in _get_collection() → returned None → tool_status() returned _no_palace(). 41 mcp_server tests went from failing-with-KeyError to passing. * mempalace/cli.py — dropped fork-only `workers=args.workers` from the cmd_mine -> miner.mine() call. Pre-existing fork-side bug: the `--workers` argparse arg landed in 5cd14bd but miner.mine() never accepted a workers param, so production `mempalace mine` TypeError'd on every invocation. Removed the broken plumbing; tests/test_cli.py updated to match. * CHANGELOG.md — took upstream verbatim. Fork-specific changelog lives in FORK_CHANGELOG.md (canonical: docs/fork-changes.yaml). * CLAUDE.md — kept ours. Fork's CLAUDE.md is operational; upstream's added a "Design Principles / Contributing" charter, which lives in README.md on the fork. * tests/test_backends.py — took upstream's ruff-formatted line widths. docs/fork-changes.yaml flips the two MemPalace#1173 entries (hnsw-integrity-gate, hnsw-cold-start-gate) and the MemPalace#1201 entry (palace-graph-none-guard) from OPEN to MERGED 2026-04-26. MemPalace#1173 MemPalace#1177 MemPalace#1198 MemPalace#1201 added to the merged_upstream archive at the bottom. FORK_CHANGELOG.md regenerated. scripts/check-docs.sh: 4/4 clean. Test suite: 1460/1460. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream merged MemPalace#1173, MemPalace#1177, MemPalace#1198, MemPalace#1201 into develop on 2026-04-26 (picked up by the 2026-04-27 develop sync). Strike out their fork-ahead rows in CLAUDE.md and add to the "Merged into upstream" section. Update PR table accordingly: 18 merged (was 14), 7 open (was 8). Bump version note: upstream shipped v3.3.3 on 2026-04-24 (carries our MemPalace#659/MemPalace#1021); v3.3.4 prep branch in flight. README: - "tracks upstream/develop through the 2026-04-27 sync" (was 2026-04-26) - 17 fork-ahead changes (was 22) - 1,510 tests pass on main (was 1,395 — upstream brought new suites) - "Open upstream PRs" 7 entries (was 11) - Drop merged rows from "Fork-ahead — open or pending" table; keep PreCompact recovery-marker row (still fork-only) scripts/check-docs.sh: clean (90 PR refs match upstream state, 13 fork hashes resolve, FORK_CHANGELOG renders clean from YAML). docs/fork-changes.yaml: no edit needed — already had merged_date on the 4 entries from the 2026-04-26 commit `5fd15db`.
Sync with upstream/develop via cherry-pick of 9 critical bugfixes: - HNSW index bloat prevention (MemPalace#1191) - SIGSEGV guards on collection reopen (MemPalace#1262, MemPalace#1289) - blob-seq marker fast-path (MemPalace#1177) - palace_graph None-metadata guard (MemPalace#1201) - palace_graph security tunnels (MemPalace#1168) - hyphenated wing name normalization (MemPalace#1197) - searcher _tokenize None guard (MemPalace#1198) - mcp_server diary topic sanitize (MemPalace#936) Tests: 1459 default + 113 benchmark + 6/7 stress all pass. (random-kill stress test was failing on dev pre-merge; not regressed.)
Problem
palace_graph.build_graphat line 95 callsmeta.get("room", "")unconditionally on every entry inbatch["metadatas"]. ChromaDB returnsNonefor drawers without metadata (legacy data, partial writes — same root cause as the entries fixed in #999 and #1094 in different files), and a singleNonedrawer crashes the entire graph build withAttributeError: 'NoneType' object has no attribute 'get'.This takes out every consumer of
build_graph:graph_stats,find_tunnels,traverse, and any palace-daemon-style HTTP endpoint that calls them. Caught 2026-04-25 by a smoke-test script against a 151K-drawer production palace —/statswas 500-ing on a single bad drawer.Fix
Skip
Noneentries with acontinueearly-return inside the loop. The graph build is recoverable:build_graphalready filtersroom and room != "general" and wingimmediately downstream, so a missing-metadata drawer was never going to participate in the result anyway. The skip is silent because logging every legacy-data drawer would be log spam on palaces that have many of them.Closes the same gap as
{}at backend boundary (closes #1020) #1094 (coerce None metadatas to{}at ChromaCollection boundary, MERGEABLE)palace_graph.pyis a separate read path that the #999 audit didn't reach, and #1094 (when it merges) handles this at the backend layer for new writes — but legacy drawers already in the palace need this defensive check.Tests
TestBuildGraph::test_none_metadata_does_not_crashconstructs a fixture with aNoneentry mixed between two real drawers and asserts the build completes with the two real drawers processed normally.🤖 Generated with Claude Code