Skip to content

[v3.3.2] 3 bug fixes: HNSW explosion prevention, status pagination, silent_save regression #1119

@lanqiu-001

Description

@lanqiu-001

We've verified the following 3 patches work in production on Windows 11 / Python 3.12 / MemPalace 3.3.2 / ChromaDB 1.5.8. Sharing here so maintainers can consider merging, and others can self-apply while waiting for official fixes.

Fix 1: HNSW explosion prevention (related to #1091)

File: mempalace/backends/chroma.py

The root cause is ChromaDB's compactor rebuilding HNSW graphs without bound on large collections. We add two pre-write safety guards.

A. Add import after import chromadb:

import chromadb
import shutil as _mp_shutil

B. Add helper functions before def quarantine_stale_hnsw:

def _check_disk_space(palace_path: str, required_mb: int = 500):
    """Abort writes when free disk space is below threshold. Prevents OOM/disk-fill scenarios."""
    try:
        _, _, free = _mp_shutil.disk_usage(palace_path)
    except OSError:
        return
    free_mb = free // (1024 * 1024)
    if free_mb < required_mb:
        raise RuntimeError(
            f"Disk space critically low: {palace_path} has {free_mb}MB free "
            f"(threshold: {required_mb}MB). Write aborted to prevent HNSW explosion."
        )

def _check_segment_size(palace_path: str, max_mb_per_segment: int = 500):
    """Detect HNSW segment bloat (link_lists.bin > threshold). Issue #1091 describes
    a case where a single segment's link_lists.bin grew to 582GB. This catches it
    before the next write rather than after disk exhaustion."""
    try:
        for entry in os.scandir(palace_path):
            if not entry.is_dir():
                continue
            for f in os.scandir(entry.path):
                if f.name == "link_lists.bin":
                    size_mb = f.stat().st_size // (1024 * 1024)
                    if size_mb > max_mb_per_segment:
                        raise RuntimeError(
                            f"HNSW segment bloat detected: {f.path} = {size_mb}MB "
                            f"(threshold: {max_mb_per_segment}MB). Run: mempalace repair "
                            "or manually delete the oversized segment directory."
                        )
    except OSError:
        pass

C. In ChromaCollection.add() and ChromaCollection.upsert(), add at the top:

settings = self._collection._client.get_settings()
palace_path = settings.persist_directory
_check_disk_space(palace_path, required_mb=500)
_check_segment_size(palace_path, max_mb_per_segment=500)

Fix 2: mempalace status crash on large palaces (fixes #1098)

File: mempalace/miner.pystatus() function

The col.get(limit=total) call is unbounded. SQLite has a bind-variable ceiling (~999 or 32766 depending on build). With >30K drawers the query fails with too many SQL variables.

Replace:

total = col.count()
r = col.get(limit=total, include=["metadatas"]) if total else {"metadatas": []}
metas = r["metadatas"]

With:

total = col.count()
metas = []
if total:
    BATCH = 5000
    for offset in range(0, total, BATCH):
        r = col.get(limit=BATCH, offset=offset, include=["metadatas"])
        metas.extend(r["metadatas"] or [])

Note: #66 added batching elsewhere in the codebase but this location was missed and regressed.


Fix 3: silent_save config flag ignored (fixes #854)

File: mempalace/hooks_cli.py

MempalaceConfig.hook_silent_save is stored correctly via the MCP tool, but hook_stop() never reads it — the block fires regardless of the setting.

A. Add import at top:

from .config import MempalaceConfig

B. In hook_stop(), after if since_last >= SAVE_INTERVAL and exchange_count > 0: add:

cfg = MempalaceConfig()
if cfg.hook_silent_save:
    _output({})
    return

Environment

  • Windows 11 Pro (build 26200)
  • Python 3.12.10
  • MemPalace 3.3.2
  • ChromaDB 1.5.8

All three fixes tested and working locally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions