Skip to content

Commit 2b2118c

Browse files
committed
fix: restore mempalace compress after stats rename (MemPalace#159)
The honest-stats rename in PR MemPalace#147 changed the keys returned by Dialect.compression_stats() (ratio -> size_ratio, compressed_chars -> summary_chars, original_tokens / compressed_tokens -> original_tokens_est / summary_tokens_est). cmd_compress still reads the old names, so mempalace compress throws KeyError on the first drawer it touches and the feature is effectively dead. Also fix the summary line at the bottom of cmd_compress. It called count_tokens("x" * total_original), but count_tokens is word-based (max(1, int(len(text.split()) * 1.3))), and a string of repeated xs is a single "word", so both totals were always 1. Accumulate the per-drawer estimates during the main loop instead, and use a token-based ratio so the summary line is self-consistent with the per-drawer dry-run output. The storage metadata key names on the compressed collection (compression_ratio, original_tokens) stay the same for compatibility with anything already reading them. Only the source of the values is updated. Fixes MemPalace#159 (points 1 and 2)
1 parent 0fdd086 commit 2b2118c

1 file changed

Lines changed: 11 additions & 12 deletions

File tree

mempalace/cli.py

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -309,16 +309,16 @@ def cmd_compress(args):
309309
)
310310
print()
311311

312-
total_original = 0
313-
total_compressed = 0
312+
total_orig_tokens = 0
313+
total_comp_tokens = 0
314314
compressed_entries = []
315315

316316
for doc, meta, doc_id in zip(docs, metas, ids):
317317
compressed = dialect.compress(doc, metadata=meta)
318318
stats = dialect.compression_stats(doc, compressed)
319319

320-
total_original += stats["original_chars"]
321-
total_compressed += stats["compressed_chars"]
320+
total_orig_tokens += stats["original_tokens_est"]
321+
total_comp_tokens += stats["summary_tokens_est"]
322322

323323
compressed_entries.append((doc_id, compressed, meta, stats))
324324

@@ -328,7 +328,8 @@ def cmd_compress(args):
328328
source = Path(meta.get("source_file", "?")).name
329329
print(f" [{wing_name}/{room_name}] {source}")
330330
print(
331-
f" {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
331+
f" {stats['original_tokens_est']}t -> {stats['summary_tokens_est']}t "
332+
f"({stats['size_ratio']:.1f}x)"
332333
)
333334
print(f" {compressed}")
334335
print()
@@ -339,8 +340,8 @@ def cmd_compress(args):
339340
comp_col = client.get_or_create_collection("mempalace_compressed")
340341
for doc_id, compressed, meta, stats in compressed_entries:
341342
comp_meta = dict(meta)
342-
comp_meta["compression_ratio"] = round(stats["ratio"], 1)
343-
comp_meta["original_tokens"] = stats["original_tokens"]
343+
comp_meta["compression_ratio"] = round(stats["size_ratio"], 1)
344+
comp_meta["original_tokens"] = stats["original_tokens_est"]
344345
comp_col.upsert(
345346
ids=[doc_id],
346347
documents=[compressed],
@@ -353,11 +354,9 @@ def cmd_compress(args):
353354
print(f" Error storing compressed drawers: {e}")
354355
sys.exit(1)
355356

356-
# Summary
357-
ratio = total_original / max(total_compressed, 1)
358-
orig_tokens = Dialect.count_tokens("x" * total_original)
359-
comp_tokens = Dialect.count_tokens("x" * total_compressed)
360-
print(f" Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
357+
# Summary: token-based ratio stays consistent with the per-drawer line.
358+
ratio = total_orig_tokens / max(total_comp_tokens, 1)
359+
print(f" Total: {total_orig_tokens:,}t -> {total_comp_tokens:,}t ({ratio:.1f}x compression)")
361360
if args.dry_run:
362361
print(" (dry run -- nothing stored)")
363362

0 commit comments

Comments
 (0)