fix: detect mtime changes in _get_collection to prevent stale HNSW index by jphein · Pull Request #663 · MemPalace/mempalace

jphein · 2026-04-12T02:48:26Z

Problem

When external tools write to the palace database (CLI mining, scripts), the
MCP server's cached ChromaDB collection becomes stale — its in-memory HNSW
index doesn't know about new vectors. Searches return incomplete results until
the MCP server process restarts.

The previous _get_collection() only invalidated the cache on process restart.
It had no mechanism to detect that the on-disk database changed.

Solution

1. mtime-based stale index detection

Track both the inode and mtime of chroma.sqlite3 in module-level globals.
On every _get_collection() call, stat the file and compare:

Inode change — catches full palace rebuilds (repair/nuke/purge), which
replace the file entirely
mtime change — catches in-place writes from CLI mining, scripts, or other
processes that append to the existing database without replacing it

Epsilon comparison (abs(current - cached) > 0.01) avoids spurious
reconnects from filesystem timestamp rounding.

FAT/exFAT filesystems return st_ino == 0 — the current_inode != 0 guard
safely skips inode detection on those filesystems.

2. `mempalace_reconnect` MCP tool

Explicit cache flush for cases where automatic detection is insufficient —
particularly metadata-only col.update() calls, which may not reliably
change mtime. Also useful in tests and scripts that need a guaranteed-fresh
connection.

Relation to #625

PR #625 (yukinoli) addresses the same stale-index problem with a different
technique: SQL COUNT(*) on the embeddings table to detect new rows since
last cache time.

This PR uses filesystem stat instead. The two approaches have different
trade-offs:

	This PR (mtime)	#625 (SQL COUNT)
Detects new embeddings	Yes (mtime changes)	Yes (count changes)
Detects metadata-only updates	Yes (mtime changes)	No (count unchanged)
Detects deletes	Yes (mtime changes)	Only if count drops
Detects external script writes	Yes	Only if they add rows
Overhead per call	`os.stat()` — ~1µs	SQL query — ~1ms
Works on all ChromaDB versions	Yes	Depends on schema

For most workloads the mtime approach has lower overhead and broader
coverage. The mempalace_reconnect tool handles the remaining edge cases
(metadata-only changes that don't touch mtime reliably).

Both approaches are valid; the maintainers can choose whichever fits the
project's direction.

Changes

mempalace/mcp_server.py
- _palace_db_inode, _palace_db_mtime module-level globals
- _get_collection(): stat chroma.sqlite3 and invalidate cache on
  inode or mtime change; update stored values after successful reconnect
- tool_reconnect(): clear all caches and force a fresh _get_collection()
- TOOLS["mempalace_reconnect"]: expose as MCP tool

Testing

ruff check passes (line length 100)
Existing test suite passes (python -m pytest tests/ -x -q)
Manually verified: mine via CLI → MCP search immediately returns new results
without server restart

Copilot

Pull request overview

This PR addresses stale ChromaDB in-memory HNSW indexes in the MCP server when chroma.sqlite3 is modified by external processes (CLI mining, scripts) by detecting on-disk changes and forcing a reconnect, and adds an explicit MCP maintenance tool to manually flush caches.

Changes:

Add inode + mtime tracking for chroma.sqlite3 to invalidate the cached collection when the underlying DB changes.
Add a new MCP tool mempalace_reconnect to force cache invalidation and reconnect.
Update the module header docs to list the new maintenance tool.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-12T02:51:53Z

+        current_inode = st.st_ino
+        current_mtime = st.st_mtime
+    except OSError:
+        current_inode = 0
+        current_mtime = 0.0
+
+    inode_changed = current_inode and current_inode != _palace_db_inode
+    mtime_changed = current_mtime and abs(current_mtime - _palace_db_mtime) > 0.01
+
+    if inode_changed or mtime_changed:


If chroma.sqlite3 disappears (or is temporarily missing during a rebuild), this code sets current_inode/current_mtime to 0 and then does not invalidate the existing _collection_cache because both inode_changed and mtime_changed become false. That can leave the server holding a cached collection pointing at a DB file that no longer exists (or has been replaced later). Consider treating an os.stat() failure as a change when a cache exists (e.g., invalidate when _palace_db_mtime/_palace_db_inode were previously non-zero, or track existence explicitly).

Suggested change

current_inode = st.st_ino

current_mtime = st.st_mtime

except OSError:

current_inode = 0

current_mtime = 0.0

inode_changed = current_inode and current_inode != _palace_db_inode

mtime_changed = current_mtime and abs(current_mtime - _palace_db_mtime) > 0.01

if inode_changed or mtime_changed:

stat_ok = True

current_inode = st.st_ino

current_mtime = st.st_mtime

except OSError:

stat_ok = False

current_inode = 0

current_mtime = 0.0

inode_changed = current_inode and current_inode != _palace_db_inode

mtime_changed = current_mtime and abs(current_mtime - _palace_db_mtime) > 0.01

db_missing_changed = (

not stat_ok

and (

_collection_cache is not None

or _palace_db_inode != 0

or _palace_db_mtime != 0.0

)

)

if inode_changed or mtime_changed or db_missing_changed:

Fixed — added explicit guard: when chroma.sqlite3 disappears and a cached collection exists, cache is cleared and inode/mtime globals reset to 0.

Copilot · 2026-04-12T02:51:53Z

+            _palace_db_inode = current_inode
+            _palace_db_mtime = current_mtime


_palace_db_inode/_palace_db_mtime are updated from the pre-connection stat values. When create=True and chroma.sqlite3 doesn’t exist yet, current_inode/current_mtime will be 0, so the cache snapshot is wrong and can force an unnecessary reconnect on the next call. Also, if the DB changes between the stat and the successful reconnect, the stored snapshot may not match the actual opened DB. Restat after successfully creating/opening the collection (or otherwise capture the post-connect inode/mtime) before updating the globals.

Suggested change

_palace_db_inode = current_inode

_palace_db_mtime = current_mtime

try:

st = os.stat(db_path)

_palace_db_inode = st.st_ino

_palace_db_mtime = st.st_mtime

except OSError:

_palace_db_inode = 0

_palace_db_mtime = 0.0

Documented — added comment explaining the benign race. Worst case is one extra reconnect on next call, which self-corrects.

Copilot · 2026-04-12T02:51:53Z

+    _palace_db_mtime = 0.0
+    try:
+        col = _get_collection()
+        count = col.count() if col else 0


tool_reconnect() currently reports success even when no collection could be opened (col is None), returning drawers=0. This makes it hard for callers to distinguish “reconnected” from “no palace / failed to connect”. Consider returning _no_palace() or success=False when _get_collection() returns None, and only reporting success after a real connection.

Suggested change

count = col.count() if col else 0

if not col:

return _no_palace()

count = col.count()

Fixed — tool_reconnect() now returns {"success": false, "message": "No palace found after reconnect"} when _get_collection() returns None.

Copilot · 2026-04-12T02:51:54Z

+    "mempalace_reconnect": {
+        "description": (
+            "Force reconnect to the palace database. Use after external scripts or CLI commands"
+            " modified the palace directly, which can leave the in-memory HNSW index stale."
+        ),
+        "input_schema": {
+            "type": "object",
+            "properties": {},
+        },
+        "handler": tool_reconnect,
+    },


New behavior (mtime/inode-based invalidation + mempalace_reconnect tool) isn’t covered by the existing MCP server tests. Since tests already exercise tools/list and tools/call dispatch, it would be good to add regression tests that (1) touch/modify chroma.sqlite3 and assert _get_collection() returns a new instance and (2) verify mempalace_reconnect appears in tools/list and clears the cache.

Fixed — added 5 tests in TestCacheInvalidation: mtime change, inode change, missing DB, reconnect failure, and reconnect success.

… tests Addresses Copilot review feedback on MemPalace#663. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

_get_collection() cached the ChromaDB collection wrapper but only invalidated on process restart. When external tools (CLI mining, scripts) wrote to the palace database in-place, the in-memory HNSW index became stale — vector searches returned stale results until the MCP server process restarted. Fix: check both inode and mtime of chroma.sqlite3 on every _get_collection() call. Inode changes catch full rebuilds (repair/nuke); mtime changes catch in-place writes. Epsilon comparison (0.01s) avoids spurious reconnects from filesystem timestamp granularity. Also adds mempalace_reconnect MCP tool for manual cache flush — useful after metadata-only updates (col.update()) that may not reliably change mtime.

… tests Addresses Copilot review feedback on MemPalace#663. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… tests Addresses Copilot review feedback on #663. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@jphein

…757) When external tools write to the palace database (CLI mining, scripts), the MCP server's cached ChromaDB collection becomes stale — its HNSW index doesn't know about new vectors. Develop already invalidates on inode changes (catches rebuilds) but not on mtime changes (misses in-place writes). This PR: - Adds st_mtime tracking alongside st_ino in _get_client; invalidates the cached client on either change. - Adds the mempalace_reconnect MCP tool for explicit cache flush. Original author: @jphein (#663). Original approval: @Ari4ka. Skips test_missing_db_invalidates_cache on Windows (ChromaDB holds chroma.sqlite3 open).

igorls · 2026-04-13T04:53:29Z

Landed on `develop` as #757 (merge `e200ce2`) — cherry-picked your two commits (authorship preserved) plus one CI-fix commit of mine:

Dropped an unused `result =` in `test_missing_db_invalidates_cache` (ruff F841).
`@pytest.mark.skipif(sys.platform == "win32", ...)` on that same test — on Windows, ChromaDB keeps `chroma.sqlite3` open through the cached client, so `os.remove()` raises `PermissionError` before the cache-invalidation path runs. Linux/macOS still exercise it.
Ruff-formatted the multi-global declaration in `_get_client`.

Closing this one; thanks for the thorough write-up comparing against #625.

Heads up — per RFC 001 (#743) §2.6, this freshness logic will migrate into `ChromaBackend.get_collection()` / `ChromaBackend.close_palace()` during the §10 cleanup. I'll add #757 to the §11 in-flight table so it's tracked during that migration.

…ed var) Post-v3.2.0 merge cleanup: - mcp_server.py: remove duplicate tool_reconnect (keep ours with broader cache clear + add upstream's None guard) - mcp_server.py: remove duplicate mempalace_reconnect TOOLS entry - miner.py: remove unused effective_min variable - CLAUDE.md: update version to 3.2.0, mark MemPalace#663 as closed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 12, 2026 02:48

jphein requested review from bensig and milla-jovovich as code owners April 12, 2026 02:48

Copilot started reviewing on behalf of jphein April 12, 2026 02:49 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026

fix: handle missing DB in cache, reconnect error reporting, add mtime…

f8adfea

… tests Addresses Copilot review feedback on MemPalace#663. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jphein requested a review from igorls as a code owner April 12, 2026 03:17

jphein mentioned this pull request Apr 12, 2026

feat: new MCP tools — get/list/update drawer, hook settings, export (resolves #635) #667

Merged

2 tasks

Ari4ka approved these changes Apr 12, 2026

View reviewed changes

jphein and others added 2 commits April 12, 2026 14:49

fix: handle missing DB in cache, reconnect error reporting, add mtime…

59e0449

… tests Addresses Copilot review feedback on MemPalace#663. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jphein force-pushed the fix/mcp-mtime-reconnect-v2 branch from f8adfea to 59e0449 Compare April 12, 2026 21:53

jphein mentioned this pull request Apr 13, 2026

docs: RFC 001 — storage backend plugin specification #743

Open

5 tasks

igorls pushed a commit that referenced this pull request Apr 13, 2026

fix: handle missing DB in cache, reconnect error reporting, add mtime…

9c1512d

… tests Addresses Copilot review feedback on #663. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igorls mentioned this pull request Apr 13, 2026

fix: detect mtime changes in _get_client to prevent stale HNSW index #757

Merged

5 tasks

igorls closed this in #757 Apr 13, 2026

jphein deleted the fix/mcp-mtime-reconnect-v2 branch April 19, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect mtime changes in _get_collection to prevent stale HNSW index#663

fix: detect mtime changes in _get_collection to prevent stale HNSW index#663
jphein wants to merge 2 commits intoMemPalace:developfrom
jphein:fix/mcp-mtime-reconnect-v2

jphein commented Apr 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 12, 2026

Uh oh!

jphein Apr 12, 2026

Uh oh!

Copilot AI Apr 12, 2026

Uh oh!

jphein Apr 12, 2026

Uh oh!

Copilot AI Apr 12, 2026

Uh oh!

jphein Apr 12, 2026

Uh oh!

Copilot AI Apr 12, 2026

Uh oh!

jphein Apr 12, 2026

Uh oh!

igorls commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		_palace_db_inode = current_inode
		_palace_db_mtime = current_mtime

Conversation

jphein commented Apr 12, 2026

Problem

Solution

1. mtime-based stale index detection

2. mempalace_reconnect MCP tool

Relation to #625

Changes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

jphein Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

igorls commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2. `mempalace_reconnect` MCP tool