perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers by jphein · Pull Request #1036 · MemPalace/mempalace

jphein · 2026-04-19T15:26:11Z

Closes #802 (reported 1 week ago) and #1015 (filed today — same crash, same stack location, filed after you merged #999).

The bug

total = col.count()
r = col.get(limit=total, include=["metadatas"])

On palaces with more than ~32,766 drawers, ChromaDB's underlying SQLite query binds total variables and exceeds SQLITE_MAX_VARIABLE_NUMBER (default 32,766), raising:

sqlite3.OperationalError: too many SQL variables

mempalace status crashes before printing anything on any palace over the limit. The fork has been running this paginated version for 9 days (landed on jphein/mempalace main on 2026-04-10) against a 152,682-drawer palace, so I haven't re-triggered the crash directly in that window — but the unpatched code path is reproducible by reverting this patch.

The fix

offset = 0
while offset < total:
    r = col.get(limit=10000, offset=offset, include=["metadatas"])
    metas = r.get("metadatas") or []
    if not metas:
        break
    ...
    offset += len(metas)

Each page stays well under the variable cap. Loop terminates when ChromaDB returns fewer than the requested batch (final page).

Also switches printed counts to thousand-separated decimals (152,682 vs 152682) — easier to read on large palaces.

Test plan

pytest tests/test_miner.py tests/test_cli.py -q — 62 passed
Paginated code path exercised on a 152,682-drawer palace in the fork since 2026-04-10 — status completes and prints correct per-wing/per-room counts

Notes

This is a targeted fix for miner.py::status() only. Same underlying issue may exist in other col.get(limit=total) call sites; happy to file follow-ups if you confirm you want this scope.
The fork (jphein/mempalace) has been running this patch since 2026-04-10. Filing now because mempalace status crashes on palaces >~32K drawers (SQLite variable limit) #1015 duplicates mempalace status crashes with "too many SQL variables" on large palaces #802 and the fix is simple enough to merge independently.

…h past ~32K drawers Closes MemPalace#802 (reported 1 week ago) and MemPalace#1015 (reported today, same crash). Before: total = col.count() r = col.get(limit=total, include=["metadatas"]) On palaces with more than ~32,766 drawers, ChromaDB's underlying SQLite query builds a SELECT with `total` bound variables and exceeds `SQLITE_MAX_VARIABLE_NUMBER` (default 32,766), raising: sqlite3.OperationalError: too many SQL variables `mempalace status` then crashes before printing anything. After: offset = 0 while offset < total: r = col.get(limit=10000, offset=offset, include=["metadatas"]) ... offset += len(metas) Each page stays well under the variable cap, and the loop terminates when ChromaDB returns fewer than the requested batch. Also switches the printed counts to thousand-separated decimals (`152,682` vs `152682`) — easier to read on large palaces. Verified on a 152,682-drawer palace that previously crashed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…emPalace#1036 Overnight/morning: - MemPalace#681, MemPalace#1000, MemPalace#1023 merged — moved from "Still ahead" to "Merged upstream (post-v3.3.1)" - bensig reviewed MemPalace#659 (wing_ prefix + agent filter) and MemPalace#1021 (silent_guard default) — both addressed on their PR branches - MemPalace#673 needed re-rebase after overnight develop merges; done - MemPalace#1036 filed: paginate miner.status(), closes upstream MemPalace#802 and MemPalace#1015 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes mempalace status crashing on large palaces by avoiding an unbounded col.get(limit=total, ...) call that can exceed SQLite’s SQLITE_MAX_VARIABLE_NUMBER via ChromaDB’s query plan.

Changes:

Paginate miner.status() metadata retrieval in 10k batches using limit/offset to stay under SQLite bind-variable limits.
Update status output to use thousands-separated numeric formatting for drawer and room counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T15:29:44Z

+    scanned = 0
+    batch = 10000
+    offset = 0
+    while offset < total:
+        r = col.get(limit=batch, offset=offset, include=["metadatas"])
+        metas = r.get("metadatas") or []
+        if not metas:
+            break
+        for m in metas:
+            m = m or {}
+            wing_rooms[m.get("wing", "?")][m.get("room", "?")] += 1
+        scanned += len(metas)
+        offset += len(metas)


The new pagination/offset loop isn’t covered by tests. There are existing unit tests for status() in tests/test_miner.py, but none assert multi-page behavior (e.g., that col.get() is called with offset advancing and that results across pages are fully tallied). Please add a regression test using a fake collection with count() > batch and a get(limit, offset, ...) that returns deterministic slices, then assert the output totals match the full dataset and multiple get() calls occur.

Copilot · 2026-04-19T15:29:45Z


    print(f"\n{'=' * 55}")
-    print(f"  MemPalace Status — {len(metas)} drawers")
+    print(f"  MemPalace Status — {scanned:,} drawers")


The header prints scanned (rows successfully fetched) rather than total (from col.count()). If count() and paged get() ever disagree (concurrent deletes, backend truncation/bug), the command will silently under-report drawers. Consider printing total as the authoritative count, and if scanned != total emit a clear “partial” indicator (similar to how tool_status exposes partial).

Suggested change

print(f" MemPalace Status — {scanned:,} drawers")

print(f" MemPalace Status — {total:,} drawers")

if scanned != total:

print(f" partial: scanned {scanned:,} of {total:,} drawers")

Session-local drift since the morning update — keeps the lead paragraph accurate for readers coming from MemPalace#1017 or MemPalace#1036.

jphein · 2026-04-19T16:07:35Z

I should have checked for existing PRs before filing — @eldar702 already filed #1016 yesterday with the same pagination approach, narrower scope (pure bug fix; mine also adds thousand-separator print formatting). #1016 is the focused version and got here first. Happy to close this in favor of #1016, or rescope to just the formatting change as a follow-up — whichever the maintainers prefer.

jphein · 2026-04-19T16:10:35Z

Further correction — I had actually noted #851 in earlier triage yesterday as the canonical approved fix, and forgot that context when I filed this today. @bensig's comment on #1016 confirms: #851 is approved, MERGEABLE, and additionally addresses #850. Both #1016 and this one are narrower duplicates. Closing this in favor of #851.

… open PRs, not 8) MemPalace#851 (vnguyen-lexipol) was approved by bensig on 2026-04-15 and already fixes the miner.status() SQLITE_MAX_VARIABLE_NUMBER crash plus MemPalace#850 silent-truncation. I filed MemPalace#1036 this morning missing that context despite having triaged MemPalace#851 the prior day. Closed MemPalace#1036 in favor of MemPalace#851, updated lead paragraph count + Still-ahead row + Open-PRs table accordingly.

@zackchiutw

Scanned all 233 open upstream PRs today against our open PRs and fork-ahead / planned-work items. Findings merged into README: - P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032 (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage rerank pipeline). Older simpler version at MemPalace#337. Dropped as fork work; watching MemPalace#1032. - P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743 (@igorls) defines the plugin contract; four backend PRs already in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes, does not rebuild. - P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033 (@zackchiutw) ships adjacent privacy-tag + progressive disclosure but not the full multi-label scheme. - Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman) which adds broader mine_global_lock() + HNSW num_threads pin. Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.

… (2026-04-22) Ben's batched queue-clear pass merged four PRs at 00:38 UTC: graph cache (MemPalace#661), deterministic hook saves (MemPalace#673), Claude Code 2.1.114 hook stdout + silent_save guard (MemPalace#1021), and upstream's own MemPalace#851 pagination fix (closing MemPalace#1036 as superseded). Moved four rows out of the "Fork Changes" / "Fork change queue" tables into their respective merged-upstream history sections. Intro sentence PR count reduced from 7 → 4 open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 19, 2026 15:26

jphein requested review from bensig and milla-jovovich as code owners April 19, 2026 15:26

Copilot started reviewing on behalf of jphein April 19, 2026 15:26 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

jphein mentioned this pull request Apr 19, 2026

mempalace status crashes on palaces >~32K drawers (SQLite variable limit) #1015

Closed

jphein mentioned this pull request Apr 19, 2026

fix(status): paginate col.get() to stay under SQLite variable limit #1016

Closed

4 tasks

jphein closed this Apr 19, 2026

jphein deleted the pr/miner-status-pagination branch April 25, 2026 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers#1036

perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers#1036
jphein wants to merge 1 commit intoMemPalace:developfrom
jphein:pr/miner-status-pagination

jphein commented Apr 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    print(f"  MemPalace Status — {scanned:,} drawers")
+    print(f"  MemPalace Status — {total:,} drawers")
+    if scanned != total:
+        print(f"  partial: scanned {scanned:,} of {total:,} drawers")

Conversation

jphein commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The bug

The fix

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jphein commented Apr 19, 2026 •

edited

Loading