-
Notifications
You must be signed in to change notification settings - Fork 6.7k
perf: paginate miner.status() — fixes SQLITE_MAX_VARIABLE_NUMBER crash past ~32K drawers #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -847,22 +847,32 @@ def status(palace_path: str): | |||||||||
| print(" Run: mempalace init <dir> then mempalace mine <dir>") | ||||||||||
| return | ||||||||||
|
|
||||||||||
| # Count by wing and room | ||||||||||
| # Count by wing and room. Paginate col.get() in 10K-drawer batches: | ||||||||||
| # a single col.get(limit=total) hits SQLite's SQLITE_MAX_VARIABLE_NUMBER | ||||||||||
| # (default 32,766) on palaces with many thousands of drawers — see #802, | ||||||||||
| # #1015. Pagination keeps each query under the variable cap. | ||||||||||
| total = col.count() | ||||||||||
| r = col.get(limit=total, include=["metadatas"]) if total else {"metadatas": []} | ||||||||||
| metas = r["metadatas"] | ||||||||||
|
|
||||||||||
| wing_rooms = defaultdict(lambda: defaultdict(int)) | ||||||||||
| for m in metas: | ||||||||||
| m = m or {} | ||||||||||
| wing_rooms[m.get("wing", "?")][m.get("room", "?")] += 1 | ||||||||||
| scanned = 0 | ||||||||||
| batch = 10000 | ||||||||||
| offset = 0 | ||||||||||
| while offset < total: | ||||||||||
| r = col.get(limit=batch, offset=offset, include=["metadatas"]) | ||||||||||
| metas = r.get("metadatas") or [] | ||||||||||
| if not metas: | ||||||||||
| break | ||||||||||
| for m in metas: | ||||||||||
| m = m or {} | ||||||||||
| wing_rooms[m.get("wing", "?")][m.get("room", "?")] += 1 | ||||||||||
| scanned += len(metas) | ||||||||||
| offset += len(metas) | ||||||||||
|
|
||||||||||
| print(f"\n{'=' * 55}") | ||||||||||
| print(f" MemPalace Status — {len(metas)} drawers") | ||||||||||
| print(f" MemPalace Status — {scanned:,} drawers") | ||||||||||
|
||||||||||
| print(f" MemPalace Status — {scanned:,} drawers") | |
| print(f" MemPalace Status — {total:,} drawers") | |
| if scanned != total: | |
| print(f" partial: scanned {scanned:,} of {total:,} drawers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new pagination/offset loop isn’t covered by tests. There are existing unit tests for
status()intests/test_miner.py, but none assert multi-page behavior (e.g., thatcol.get()is called withoffsetadvancing and that results across pages are fully tallied). Please add a regression test using a fake collection withcount() > batchand aget(limit, offset, ...)that returns deterministic slices, then assert the output totals match the full dataset and multipleget()calls occur.