Skip to content

Commit 6578851

Browse files
committed
fix: restore mempalace compress after stats rename (MemPalace#159)
The honest-stats rename in PR MemPalace#147 changed the keys returned by Dialect.compression_stats() (ratio -> size_ratio, compressed_chars -> summary_chars, original_tokens / compressed_tokens -> original_tokens_est / summary_tokens_est). cmd_compress still reads the old names, so mempalace compress throws KeyError on the first drawer it touches and the feature is effectively dead. Also fix the summary line at the bottom of cmd_compress. It called count_tokens("x" * total_original), but count_tokens is word-based (max(1, int(len(text.split()) * 1.3))), and a string of repeated xs is a single "word", so both totals were always 1. Accumulate the per-drawer estimates during the main loop instead, and use a token-based ratio so the summary line is self-consistent with the per-drawer dry-run output. The storage metadata key names on the compressed collection (compression_ratio, original_tokens) stay the same for compatibility with anything already reading them. Only the source of the values is updated. Fixes MemPalace#159 (points 1 and 2)
1 parent 1056018 commit 6578851

1 file changed

Lines changed: 11 additions & 12 deletions

File tree

mempalace/cli.py

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -340,16 +340,16 @@ def cmd_compress(args):
340340
)
341341
print()
342342

343-
total_original = 0
344-
total_compressed = 0
343+
total_orig_tokens = 0
344+
total_comp_tokens = 0
345345
compressed_entries = []
346346

347347
for doc, meta, doc_id in zip(docs, metas, ids):
348348
compressed = dialect.compress(doc, metadata=meta)
349349
stats = dialect.compression_stats(doc, compressed)
350350

351-
total_original += stats["original_chars"]
352-
total_compressed += stats["compressed_chars"]
351+
total_orig_tokens += stats["original_tokens_est"]
352+
total_comp_tokens += stats["summary_tokens_est"]
353353

354354
compressed_entries.append((doc_id, compressed, meta, stats))
355355

@@ -359,7 +359,8 @@ def cmd_compress(args):
359359
source = Path(meta.get("source_file", "?")).name
360360
print(f" [{wing_name}/{room_name}] {source}")
361361
print(
362-
f" {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
362+
f" {stats['original_tokens_est']}t -> {stats['summary_tokens_est']}t "
363+
f"({stats['size_ratio']:.1f}x)"
363364
)
364365
print(f" {compressed}")
365366
print()
@@ -370,8 +371,8 @@ def cmd_compress(args):
370371
comp_col = client.get_or_create_collection("mempalace_compressed")
371372
for doc_id, compressed, meta, stats in compressed_entries:
372373
comp_meta = dict(meta)
373-
comp_meta["compression_ratio"] = round(stats["ratio"], 1)
374-
comp_meta["original_tokens"] = stats["original_tokens"]
374+
comp_meta["compression_ratio"] = round(stats["size_ratio"], 1)
375+
comp_meta["original_tokens"] = stats["original_tokens_est"]
375376
comp_col.upsert(
376377
ids=[doc_id],
377378
documents=[compressed],
@@ -384,11 +385,9 @@ def cmd_compress(args):
384385
print(f" Error storing compressed drawers: {e}")
385386
sys.exit(1)
386387

387-
# Summary
388-
ratio = total_original / max(total_compressed, 1)
389-
orig_tokens = Dialect.count_tokens("x" * total_original)
390-
comp_tokens = Dialect.count_tokens("x" * total_compressed)
391-
print(f" Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
388+
# Summary: token-based ratio stays consistent with the per-drawer line.
389+
ratio = total_orig_tokens / max(total_comp_tokens, 1)
390+
print(f" Total: {total_orig_tokens:,}t -> {total_comp_tokens:,}t ({ratio:.1f}x compression)")
392391
if args.dry_run:
393392
print(" (dry run -- nothing stored)")
394393

0 commit comments

Comments
 (0)