Skip to content

perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers#1036

Closed
jphein wants to merge 1 commit intoMemPalace:developfrom
jphein:pr/miner-status-pagination
Closed

perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers#1036
jphein wants to merge 1 commit intoMemPalace:developfrom
jphein:pr/miner-status-pagination

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 19, 2026

Closes #802 (reported 1 week ago) and #1015 (filed today — same crash, same stack location, filed after you merged #999).

The bug

total = col.count()
r = col.get(limit=total, include=["metadatas"])

On palaces with more than ~32,766 drawers, ChromaDB's underlying SQLite query binds total variables and exceeds SQLITE_MAX_VARIABLE_NUMBER (default 32,766), raising:

sqlite3.OperationalError: too many SQL variables

mempalace status crashes before printing anything on any palace over the limit. The fork has been running this paginated version for 9 days (landed on jphein/mempalace main on 2026-04-10) against a 152,682-drawer palace, so I haven't re-triggered the crash directly in that window — but the unpatched code path is reproducible by reverting this patch.

The fix

offset = 0
while offset < total:
    r = col.get(limit=10000, offset=offset, include=["metadatas"])
    metas = r.get("metadatas") or []
    if not metas:
        break
    ...
    offset += len(metas)

Each page stays well under the variable cap. Loop terminates when ChromaDB returns fewer than the requested batch (final page).

Also switches printed counts to thousand-separated decimals (152,682 vs 152682) — easier to read on large palaces.

Test plan

  • pytest tests/test_miner.py tests/test_cli.py -q — 62 passed
  • Paginated code path exercised on a 152,682-drawer palace in the fork since 2026-04-10 — status completes and prints correct per-wing/per-room counts

Notes

…h past ~32K drawers

Closes MemPalace#802 (reported 1 week ago) and MemPalace#1015 (reported today, same crash).

Before:
    total = col.count()
    r = col.get(limit=total, include=["metadatas"])

On palaces with more than ~32,766 drawers, ChromaDB's underlying
SQLite query builds a SELECT with `total` bound variables and exceeds
`SQLITE_MAX_VARIABLE_NUMBER` (default 32,766), raising:

    sqlite3.OperationalError: too many SQL variables

`mempalace status` then crashes before printing anything.

After:
    offset = 0
    while offset < total:
        r = col.get(limit=10000, offset=offset, include=["metadatas"])
        ...
        offset += len(metas)

Each page stays well under the variable cap, and the loop terminates
when ChromaDB returns fewer than the requested batch.

Also switches the printed counts to thousand-separated decimals
(`152,682` vs `152682`) — easier to read on large palaces.

Verified on a 152,682-drawer palace that previously crashed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 19, 2026 15:26
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
…emPalace#1036

Overnight/morning:
- MemPalace#681, MemPalace#1000, MemPalace#1023 merged — moved from "Still ahead" to "Merged upstream (post-v3.3.1)"
- bensig reviewed MemPalace#659 (wing_ prefix + agent filter) and MemPalace#1021 (silent_guard
  default) — both addressed on their PR branches
- MemPalace#673 needed re-rebase after overnight develop merges; done
- MemPalace#1036 filed: paginate miner.status(), closes upstream MemPalace#802 and MemPalace#1015

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes mempalace status crashing on large palaces by avoiding an unbounded col.get(limit=total, ...) call that can exceed SQLite’s SQLITE_MAX_VARIABLE_NUMBER via ChromaDB’s query plan.

Changes:

  • Paginate miner.status() metadata retrieval in 10k batches using limit/offset to stay under SQLite bind-variable limits.
  • Update status output to use thousands-separated numeric formatting for drawer and room counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/miner.py
Comment on lines +856 to +868
scanned = 0
batch = 10000
offset = 0
while offset < total:
r = col.get(limit=batch, offset=offset, include=["metadatas"])
metas = r.get("metadatas") or []
if not metas:
break
for m in metas:
m = m or {}
wing_rooms[m.get("wing", "?")][m.get("room", "?")] += 1
scanned += len(metas)
offset += len(metas)
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new pagination/offset loop isn’t covered by tests. There are existing unit tests for status() in tests/test_miner.py, but none assert multi-page behavior (e.g., that col.get() is called with offset advancing and that results across pages are fully tallied). Please add a regression test using a fake collection with count() > batch and a get(limit, offset, ...) that returns deterministic slices, then assert the output totals match the full dataset and multiple get() calls occur.

Copilot uses AI. Check for mistakes.
Comment thread mempalace/miner.py

print(f"\n{'=' * 55}")
print(f" MemPalace Status — {len(metas)} drawers")
print(f" MemPalace Status — {scanned:,} drawers")
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header prints scanned (rows successfully fetched) rather than total (from col.count()). If count() and paged get() ever disagree (concurrent deletes, backend truncation/bug), the command will silently under-report drawers. Consider printing total as the authoritative count, and if scanned != total emit a clear “partial” indicator (similar to how tool_status exposes partial).

Suggested change
print(f" MemPalace Status — {scanned:,} drawers")
print(f" MemPalace Status — {total:,} drawers")
if scanned != total:
print(f" partial: scanned {scanned:,} of {total:,} drawers")

Copilot uses AI. Check for mistakes.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Session-local drift since the morning update — keeps the lead
paragraph accurate for readers coming from MemPalace#1017 or MemPalace#1036.
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 19, 2026

I should have checked for existing PRs before filing — @eldar702 already filed #1016 yesterday with the same pagination approach, narrower scope (pure bug fix; mine also adds thousand-separator print formatting). #1016 is the focused version and got here first. Happy to close this in favor of #1016, or rescope to just the formatting change as a follow-up — whichever the maintainers prefer.

@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 19, 2026

Further correction — I had actually noted #851 in earlier triage yesterday as the canonical approved fix, and forgot that context when I filed this today. @bensig's comment on #1016 confirms: #851 is approved, MERGEABLE, and additionally addresses #850. Both #1016 and this one are narrower duplicates. Closing this in favor of #851.

@jphein jphein closed this Apr 19, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
… open PRs, not 8)

MemPalace#851 (vnguyen-lexipol) was approved by bensig on 2026-04-15 and
already fixes the miner.status() SQLITE_MAX_VARIABLE_NUMBER crash
plus MemPalace#850 silent-truncation. I filed MemPalace#1036 this morning missing
that context despite having triaged MemPalace#851 the prior day. Closed
MemPalace#1036 in favor of MemPalace#851, updated lead paragraph count + Still-ahead
row + Open-PRs table accordingly.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026
Scanned all 233 open upstream PRs today against our open PRs and
fork-ahead / planned-work items. Findings merged into README:

- P2 (decay) and P3 Tier-0 (LLM rerank): both covered by MemPalace#1032
  (@zackchiutw, MERGEABLE, 2026-04-19 — Weibull decay + 4-stage
  rerank pipeline). Older simpler version at MemPalace#337. Dropped as
  fork work; watching MemPalace#1032.
- P7 (alternative storage): formally out of scope. RFC 001 MemPalace#743
  (@igorls) defines the plugin contract; four backend PRs already
  in flight (MemPalace#700, MemPalace#381 Qdrant; MemPalace#574, MemPalace#575 LanceDB). Fork consumes,
  does not rebuild.
- P0 (multi-label tags): still fork/upstream candidate. MemPalace#1033
  (@zackchiutw) ships adjacent privacy-tag + progressive disclosure
  but not the full multi-label scheme.
- Merged MemPalace#1023 section acknowledges complementary MemPalace#976 (felipetruman)
  which adds broader mine_global_lock() + HNSW num_threads pin.

Gives future-us a map so we don't re-file MemPalace#1036-style duplicates.
jphein added a commit to jphein/mempalace that referenced this pull request Apr 22, 2026
… (2026-04-22)

Ben's batched queue-clear pass merged four PRs at 00:38 UTC:
graph cache (MemPalace#661), deterministic hook saves (MemPalace#673), Claude Code
2.1.114 hook stdout + silent_save guard (MemPalace#1021), and upstream's
own MemPalace#851 pagination fix (closing MemPalace#1036 as superseded).

Moved four rows out of the "Fork Changes" / "Fork change queue"
tables into their respective merged-upstream history sections.
Intro sentence PR count reduced from 7 → 4 open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jphein jphein deleted the pr/miner-status-pagination branch April 25, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mempalace status crashes with "too many SQL variables" on large palaces

2 participants