Skip to content

Commit 882f05e

Browse files
committed
fix: restore mempalace compress after stats rename (MemPalace#159)
The honest-stats rename in PR MemPalace#147 changed the keys returned by Dialect.compression_stats() (ratio -> size_ratio, compressed_chars -> summary_chars, original_tokens / compressed_tokens -> original_tokens_est / summary_tokens_est). cmd_compress still reads the old names, so mempalace compress throws KeyError on the first drawer it touches and the feature is effectively dead. Also fix the summary line at the bottom of cmd_compress. It called count_tokens("x" * total_original), but count_tokens is word-based (max(1, int(len(text.split()) * 1.3))), and a string of repeated xs is a single "word", so both totals were always 1. Accumulate the per-drawer estimates during the main loop instead, and use a token-based ratio so the summary line is self-consistent with the per-drawer dry-run output. The storage metadata key names on the compressed collection (compression_ratio, original_tokens) stay the same for compatibility with anything already reading them. Only the source of the values is updated. Fixes MemPalace#159 (points 1 and 2)
1 parent 252e440 commit 882f05e

1 file changed

Lines changed: 11 additions & 12 deletions

File tree

mempalace/cli.py

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -308,16 +308,16 @@ def cmd_compress(args):
308308
)
309309
print()
310310

311-
total_original = 0
312-
total_compressed = 0
311+
total_orig_tokens = 0
312+
total_comp_tokens = 0
313313
compressed_entries = []
314314

315315
for doc, meta, doc_id in zip(docs, metas, ids):
316316
compressed = dialect.compress(doc, metadata=meta)
317317
stats = dialect.compression_stats(doc, compressed)
318318

319-
total_original += stats["original_chars"]
320-
total_compressed += stats["compressed_chars"]
319+
total_orig_tokens += stats["original_tokens_est"]
320+
total_comp_tokens += stats["summary_tokens_est"]
321321

322322
compressed_entries.append((doc_id, compressed, meta, stats))
323323

@@ -327,7 +327,8 @@ def cmd_compress(args):
327327
source = Path(meta.get("source_file", "?")).name
328328
print(f" [{wing_name}/{room_name}] {source}")
329329
print(
330-
f" {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
330+
f" {stats['original_tokens_est']}t -> {stats['summary_tokens_est']}t "
331+
f"({stats['size_ratio']:.1f}x)"
331332
)
332333
print(f" {compressed}")
333334
print()
@@ -338,8 +339,8 @@ def cmd_compress(args):
338339
comp_col = client.get_or_create_collection("mempalace_compressed")
339340
for doc_id, compressed, meta, stats in compressed_entries:
340341
comp_meta = dict(meta)
341-
comp_meta["compression_ratio"] = round(stats["ratio"], 1)
342-
comp_meta["original_tokens"] = stats["original_tokens"]
342+
comp_meta["compression_ratio"] = round(stats["size_ratio"], 1)
343+
comp_meta["original_tokens"] = stats["original_tokens_est"]
343344
comp_col.upsert(
344345
ids=[doc_id],
345346
documents=[compressed],
@@ -352,11 +353,9 @@ def cmd_compress(args):
352353
print(f" Error storing compressed drawers: {e}")
353354
sys.exit(1)
354355

355-
# Summary
356-
ratio = total_original / max(total_compressed, 1)
357-
orig_tokens = Dialect.count_tokens("x" * total_original)
358-
comp_tokens = Dialect.count_tokens("x" * total_compressed)
359-
print(f" Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
356+
# Summary: token-based ratio stays consistent with the per-drawer line.
357+
ratio = total_orig_tokens / max(total_comp_tokens, 1)
358+
print(f" Total: {total_orig_tokens:,}t -> {total_comp_tokens:,}t ({ratio:.1f}x compression)")
360359
if args.dry_run:
361360
print(" (dry run -- nothing stored)")
362361

0 commit comments

Comments
 (0)