refactor(backends/chroma): coerce None metadatas to {} at backend boundary (closes #1020)#1094
refactor(backends/chroma): coerce None metadatas to {} at backend boundary (closes #1020)#1094jphein wants to merge 2 commits intoMemPalace:developfrom
{} at backend boundary (closes #1020)#1094Conversation
There was a problem hiding this comment.
Pull request overview
Refactors the ChromaDB adapter boundary to guarantee typed QueryResult / GetResult contracts by coercing None entries inside returned metadatas into {}, preventing downstream meta.get(...) crashes caused by ChromaDB 1.5.x.
Changes:
- Coerce inner-list
Nonemetadata dicts to{}inChromaCollection.query()andChromaCollection.get(). - Add backend-level regression tests covering mixed/degenerate
Nonemetadata shapes andinclude=behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
mempalace/backends/chroma.py |
Normalizes metadatas so callers always receive list[dict] (no inner None). |
tests/test_backends.py |
Adds adapter-level tests validating the new coercion behavior for query() and get(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert result.metadatas == [[{"wing": "w1"}, {}, {"wing": "w3"}]] | ||
| # Ids/docs/distances unaffected — their inner None handling is already | ||
| # covered by the outer-list coercion. | ||
| assert result.ids == [["a", "b", "c"]] |
There was a problem hiding this comment.
This test’s comment says ids/docs/distances are unaffected, but it never asserts result.documents. Adding a documents assertion would make the test actually cover that invariant and catch regressions in the query() return-shaping logic.
| assert result.ids == [["a", "b", "c"]] | |
| assert result.ids == [["a", "b", "c"]] | |
| assert result.documents == [["da", "db", "dc"]] |
There was a problem hiding this comment.
Done in bf91b27 — added assert result.documents == [["da", "db", "dc"]] so the comment's claim is actually pinned by the test. 12/12 backend tests pass.
…emPalace#1094 rows + MemPalace#659 CLEAN Three open-PR rows were missing (cmd_export, cmd_purge, None-metadata boundary). MemPalace#659 moved from MERGEABLE to CLEAN after the 2026-04-22 rebase onto current develop cleared its stale-merge blocker. Open-PR count: 4 → 7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndary
ChromaDB 1.5.x can return None inside the `metadatas` list even when
the write stored a dict. Three prior PRs added per-site `meta = meta
or {}` guards to compensate:
- MemPalace#999 — guards in searcher.py (CLI + API + closet-boost),
miner.status(), and 4 mcp_server.py handlers
- MemPalace#1013 — guard in Layer3.search_raw()
- Various other paths picked up the same defensive pattern
organically (e.g. palace_graph, layers.py L1 pre-filter)
That scatter works but violates the typed contract QueryResult and
GetResult declare: both advertise `metadatas: list[dict]`, never
`list[Optional[dict]]`. Every call site that forgot the guard is a
latent `AttributeError: 'NoneType' object has no attribute 'get'`.
This PR moves the coercion to the single place the type contract is
produced — ChromaCollection.query() and ChromaCollection.get() in
backends/chroma.py. Each individual None dict gets coerced to {} at
the same boundary where the outer list is already normalized via
_none_list_to_empty(). Downstream callers receive a guaranteed
list[dict] matching the declared return shape.
## Per-site guards
Leaving the existing guards in place as defensive belt-and-suspenders.
They become redundant after this boundary coercion but removing them
risks regressing any call path that might bypass the typed wrappers.
A follow-up cleanup PR can remove them once we're confident nothing
else reaches chromadb directly.
## Tests
Six new tests in tests/test_backends.py cover the boundary's behavior
on the known None-metadata shapes:
- query(): mixed None / dict in a single batch
- query(): all-None inner list
- query(): None dicts across multiple queries in one call
- get(): mixed None / dict
- get(): padding regression (short metas list → {} pads, never None)
- get(): metadatas not requested → empty list passthrough
Full test_backends.py suite: 35 passed (6 new).
Full repo suite: 1072 passed.
Ruff format + check clean.
2806d9a to
52e376f
Compare
ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
palace_graph.build_graph None-metadata guard fix (commit 5fd15db). Caught by palace-daemon's verify-routes.sh on 2026-04-25; /stats was 500-ing on a single legacy drawer. Filed upstream as PR MemPalace#1201 — same lineage as MemPalace#999 (None-metadata audit) and MemPalace#1094 (None-coercion at backend boundary). Updates the row count: 14 merged, 8 open (added MemPalace#1201), 10 closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ases A–C) Implements the structural fix promoted by the 2026-04-25 Cat 9 A/B (632 vs 3 tokens per query): Stop-hook auto-save checkpoint diary entries no longer share an index with content drawers. Phase A — scaffolding (no behavior change): - _SESSION_RECOVERY_COLLECTION constant + get_session_recovery_collection() in palace.py, mirroring get_collection's shape (cosine + thread-pin). - New tests/test_session_recovery.py: 3 tests covering constant, metadata, multi-collection coexistence on one palace. Phase B — write routing (the behavior change): - tool_diary_write routes topic in _CHECKPOINT_TOPICS to the recovery collection; everything else stays in the main collection. - _get_session_recovery_collection() in mcp_server.py with parallel cache. - conftest._reset_mcp_cache resets the new cache between tests. - 4 tests in TestCheckpointRouting — checkpoint→recovery, auto-save→recovery, general→main, search regression (checkpoints structurally invisible to mempalace_search now, not merely post-filtered). Phase C — new read path: - tool_session_recovery_read reads from the recovery collection only, with optional filters: session_id, agent, since, until, wing, limit. - session_id added as optional metadata field on tool_diary_write so recovery reads can filter by Claude Code session. - Defensive None-metadata handling per the MemPalace#999 / MemPalace#1094 / MemPalace#1201 family. - Registered in TOOLS dict + documented in website/reference/mcp-tools.md (test_no_undocumented_tools regression caught the missing doc; fixed). - 5 tests in TestSessionRecoveryRead — empty case, session_id filter, agent filter, limit, None-metadata defense. Phases D (migration of existing checkpoints out of main collection) and E (palace-daemon startup integration) explicitly deferred — multi-day work, gated on a separate go-ahead. Full suite 1360/1360 pass; 106 benchmark/stress deselected per default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rry-pick of MemPalace#1094) Fork main was carrying the per-site `meta = meta or {}` guards from MemPalace#999 in eight read paths but didn't have the boundary coercion that closes the issue once for all callers. Cherry-picked MemPalace#1094 (open upstream, jp-authored) so the typed QueryResult/GetResult contract — both declare `metadatas: list[dict]`, never `list[Optional[dict]]` — is honored at the chromadb adapter boundary. Three same-family fork-ahead PRs we filed since (MemPalace#1198 _tokenize None-doc, MemPalace#1201 palace_graph None metadata, MemPalace#1199 review) all pointed at gaps that would have been impossible if this pattern had been in place. Future None-metadata bugs of the same shape are now catchable at one site instead of N. Per-site guards are kept as belt-and-suspenders for paths that might bypass the typed wrappers (e.g. legacy `import chromadb` direct calls). A future cleanup PR can remove them once we're confident no caller reaches chromadb directly. 6 new tests in test_backends.py (MemPalace#1094 originals); composes cleanly with fork main's _segment_appears_healthy + _quarantined_paths from earlier today. Full suite 1383/1383. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fork-main now carries: - MemPalace#1087 rewrite (cmd_purge: collection.delete(where=...) instead of nuke-and-rebuild) — commit 366a9ad - MemPalace#1094 cherry-pick (coerce None metadatas at chromadb boundary) — commit 43d728d YAML manifest gains 2 entries; FORK_CHANGELOG regenerated. README test count bumped 1372 → 1383. CLAUDE.md gets: - row updated for MemPalace#1087 (rewrite per @igorls's review) - new "Closed by jphein-with-triage" subsection noting MemPalace#622 closure (architectural concern resolved by MemPalace#673; verified before clicking) Triage perms make audits load-bearing. New feedback memory: feedback_audit_comments_with_triage_perms.md — verify PR/commit claims via gh before posting, gh api PATCH supports edits, track in-comment promises in scratch/promises.md. scripts/check-docs.sh ran clean across all 4 stages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream #1020 / our PR #1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's #999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in upstream's corpus-origin + privacy-warning track (PRs MemPalace#1211 MemPalace#1221 MemPalace#1223 MemPalace#1224 MemPalace#1225) plus the canonical merged versions of our four PRs that landed today (21:22-21:41 UTC): MemPalace#1173 quarantine_stale_hnsw on make_client + cold-start gate + integrity sniff-test (PROTO/STOP byte check, no deserialization) MemPalace#1177 .blob_seq_ids_migrated marker guard, closes MemPalace#1090 MemPalace#1198 _tokenize None-document guard in BM25 reranker MemPalace#1201 palace_graph.build_graph skips None metadata Conflict resolution: * mempalace/backends/chroma.py — took upstream as base (it has the igorls-review pickle-protocol docstring, thread-safety paragraph, and Path(marker).touch() style nit), then re-applied MemPalace#1094's _coerce_none_metas in query()/get() since MemPalace#1094 is still open and not yet in develop. * mempalace/mcp_server.py — took upstream's clean form. Dropped the fork-only `palace_path=` kwarg from four ChromaCollection() call sites: the kwarg was load-bearing for MemPalace#1171's per-collection write lock, but MemPalace#1171 closed in favor of MemPalace#976's mine_global_lock + daemon-strict, so the kwarg has no remaining consumer. ChromaCollection.__init__ in upstream/develop is back to (self, collection); calling it with palace_path= raised TypeError → silently swallowed by the broad except in _get_collection() → returned None → tool_status() returned _no_palace(). 41 mcp_server tests went from failing-with-KeyError to passing. * mempalace/cli.py — dropped fork-only `workers=args.workers` from the cmd_mine -> miner.mine() call. Pre-existing fork-side bug: the `--workers` argparse arg landed in 5cd14bd but miner.mine() never accepted a workers param, so production `mempalace mine` TypeError'd on every invocation. Removed the broken plumbing; tests/test_cli.py updated to match. * CHANGELOG.md — took upstream verbatim. Fork-specific changelog lives in FORK_CHANGELOG.md (canonical: docs/fork-changes.yaml). * CLAUDE.md — kept ours. Fork's CLAUDE.md is operational; upstream's added a "Design Principles / Contributing" charter, which lives in README.md on the fork. * tests/test_backends.py — took upstream's ruff-formatted line widths. docs/fork-changes.yaml flips the two MemPalace#1173 entries (hnsw-integrity-gate, hnsw-cold-start-gate) and the MemPalace#1201 entry (palace-graph-none-guard) from OPEN to MERGED 2026-04-26. MemPalace#1173 MemPalace#1177 MemPalace#1198 MemPalace#1201 added to the merged_upstream archive at the bottom. FORK_CHANGELOG.md regenerated. scripts/check-docs.sh: 4/4 clean. Test suite: 1460/1460. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ChromaDB can return None for drawers without metadata (legacy data, partial writes — same root cause as upstream MemPalace#1020 / our PR MemPalace#1094). build_graph at line 95 called meta.get("room", "") unconditionally, which AttributeErrors on None and takes out every consumer of build_graph for the whole call path: graph_stats, find_tunnels, traverse, and (most visibly) the daemon's /stats endpoint. Caught 2026-04-25 by palace-daemon's verify-routes.sh smoke test against the canonical 151K-drawer palace — /stats was 500-ing on a single None drawer. Adds `if meta is None: continue` guard. Closes the same gap upstream's MemPalace#999 None-metadata audit closed in searcher.py / mcp_server.py / miner.status, just in a different file the audit didn't reach. The graph-build is recoverable: skipping a single None drawer doesn't distort the graph since build_graph already filters `room and room != "general" and wing` — a missing-metadata drawer was never going to participate anyway. Test: TestBuildGraph::test_none_metadata_does_not_crash mixes a None entry into a 3-drawer fixture and asserts the two real drawers are processed normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the test-coverage hint from @copilot-pull-request-reviewer on MemPalace#1094: the surrounding comment claimed ids/docs/distances are unaffected by metadata-coercion, but the test never asserted \`result.documents\`. Adding it locks in the invariant — a future regression in the query() return-shaping that incidentally drops or rewrites documents would now fail this test rather than slip by. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fixes #1020. Moves the
None-metadata coercion out of per-read-site defensive guards (meta = meta or {}scattered acrosssearcher.py,layers.py,palace_graph.py,miner.py, etc.) and into the single place the typed contract is produced:ChromaCollection.query()andChromaCollection.get()inbackends/chroma.py.Why
ChromaDB 1.5.x can return
Noneinside themetadataslist even when the write stored a dict. Three prior PRs already responded to this:searcher.py(CLI + API + closet-boost),miner.status(), and 4mcp_server.pyhandlersLayer3.search_raw()That scatter works but violates the typed contract
QueryResultandGetResultdeclare: both advertisemetadatas: list[dict], neverlist[Optional[dict]]. Every call site that forgets the guard is a latentAttributeError: 'NoneType' object has no attribute 'get'.A boundary coercion is the right architectural fix: one place to handle the backend's quirk, downstream code gets the shape its type annotations promise.
What changed
mempalace/backends/chroma.py: added inner-list None-coercion insideChromaCollection.query()(alongside the existing_none_list_to_emptyouter coercion) and insideChromaCollection.get()(after the existing{}-padding step). The guarantee becomes: whenevermetadatasis requested, every element is a dict — neverNone.tests/test_backends.py: +6 tests covering the shapes the production code actually sees.What did NOT change
meta = meta or {}guards stay in place as defensive belt-and-suspenders. They're redundant after this coercion but removing them risks regressing any call path that bypasses the typed wrappers. A follow-up cleanup PR can remove them once we're confident nothing else reaches chromadb directly.QueryResult/GetResultshape. Same fields, same types — just guaranteed honored at the boundary instead of honored-with-asterisks.Test plan
pytest tests/test_backends.py— 35 passed (6 new, 29 existing)pytest tests/full suite — 1072 passed, 106 deselectedruff checkon modified files — cleanruff format --checkwith pinned 0.4.10 — cleanNew test cases specifically target:
test_chroma_collection_query_coerces_none_metadatas_to_empty_dicttest_chroma_collection_query_coerces_all_none_inner_metadatastest_chroma_collection_query_coerces_none_across_multiple_queriestest_chroma_collection_get_coerces_none_metadatas_to_empty_dictget()pathtest_chroma_collection_get_coerces_padding_remains_dict{}pads stay{}test_chroma_collection_get_without_metadatas_requested_stays_emptyinclude=, don't fabricateContext
I'd committed to driving this on #1020 on 2026-04-18; #1021's merge this week cleared the blocker. @bensig — happy to split into two PRs (the coercion itself and a separate guard-removal follow-up) if you'd prefer landing them serially, but as written this PR keeps behavior strictly monotonic: everything that worked before still works, plus one class of latent None-metadata bugs can't fire anymore.