Skip to content

Commit dfe8dbc

Browse files
committed
fix: restore mempalace compress after stats rename (MemPalace#159)
The honest-stats rename in PR MemPalace#147 changed the keys returned by Dialect.compression_stats() (ratio -> size_ratio, compressed_chars -> summary_chars, original_tokens / compressed_tokens -> original_tokens_est / summary_tokens_est). cmd_compress still reads the old names, so mempalace compress throws KeyError on the first drawer it touches and the feature is effectively dead. Also fix the summary line at the bottom of cmd_compress. It called count_tokens("x" * total_original), but count_tokens is word-based (max(1, int(len(text.split()) * 1.3))), and a string of repeated xs is a single "word", so both totals were always 1. Accumulate the per-drawer estimates during the main loop instead, and use a token-based ratio so the summary line is self-consistent with the per-drawer dry-run output. The storage metadata key names on the compressed collection (compression_ratio, original_tokens) stay the same for compatibility with anything already reading them. Only the source of the values is updated. Fixes MemPalace#159 (points 1 and 2)
1 parent 2981433 commit dfe8dbc

1 file changed

Lines changed: 11 additions & 12 deletions

File tree

mempalace/cli.py

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -332,16 +332,16 @@ def cmd_compress(args):
332332
)
333333
print()
334334

335-
total_original = 0
336-
total_compressed = 0
335+
total_orig_tokens = 0
336+
total_comp_tokens = 0
337337
compressed_entries = []
338338

339339
for doc, meta, doc_id in zip(docs, metas, ids):
340340
compressed = dialect.compress(doc, metadata=meta)
341341
stats = dialect.compression_stats(doc, compressed)
342342

343-
total_original += stats["original_chars"]
344-
total_compressed += stats["compressed_chars"]
343+
total_orig_tokens += stats["original_tokens_est"]
344+
total_comp_tokens += stats["summary_tokens_est"]
345345

346346
compressed_entries.append((doc_id, compressed, meta, stats))
347347

@@ -351,7 +351,8 @@ def cmd_compress(args):
351351
source = Path(meta.get("source_file", "?")).name
352352
print(f" [{wing_name}/{room_name}] {source}")
353353
print(
354-
f" {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
354+
f" {stats['original_tokens_est']}t -> {stats['summary_tokens_est']}t "
355+
f"({stats['size_ratio']:.1f}x)"
355356
)
356357
print(f" {compressed}")
357358
print()
@@ -362,8 +363,8 @@ def cmd_compress(args):
362363
comp_col = client.get_or_create_collection("mempalace_compressed")
363364
for doc_id, compressed, meta, stats in compressed_entries:
364365
comp_meta = dict(meta)
365-
comp_meta["compression_ratio"] = round(stats["ratio"], 1)
366-
comp_meta["original_tokens"] = stats["original_tokens"]
366+
comp_meta["compression_ratio"] = round(stats["size_ratio"], 1)
367+
comp_meta["original_tokens"] = stats["original_tokens_est"]
367368
comp_col.upsert(
368369
ids=[doc_id],
369370
documents=[compressed],
@@ -376,11 +377,9 @@ def cmd_compress(args):
376377
print(f" Error storing compressed drawers: {e}")
377378
sys.exit(1)
378379

379-
# Summary
380-
ratio = total_original / max(total_compressed, 1)
381-
orig_tokens = Dialect.count_tokens("x" * total_original)
382-
comp_tokens = Dialect.count_tokens("x" * total_compressed)
383-
print(f" Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
380+
# Summary: token-based ratio stays consistent with the per-drawer line.
381+
ratio = total_orig_tokens / max(total_comp_tokens, 1)
382+
print(f" Total: {total_orig_tokens:,}t -> {total_comp_tokens:,}t ({ratio:.1f}x compression)")
384383
if args.dry_run:
385384
print(" (dry run -- nothing stored)")
386385

0 commit comments

Comments
 (0)