Skip to content

Commit c0afe5e

Browse files
committed
fix: restore mempalace compress after stats rename (MemPalace#159)
The honest-stats rename in PR MemPalace#147 changed the keys returned by Dialect.compression_stats() (ratio -> size_ratio, compressed_chars -> summary_chars, original_tokens / compressed_tokens -> original_tokens_est / summary_tokens_est). cmd_compress still reads the old names, so mempalace compress throws KeyError on the first drawer it touches and the feature is effectively dead. Also fix the summary line at the bottom of cmd_compress. It called count_tokens("x" * total_original), but count_tokens is word-based (max(1, int(len(text.split()) * 1.3))), and a string of repeated xs is a single "word", so both totals were always 1. Accumulate the per-drawer estimates during the main loop instead, and use a token-based ratio so the summary line is self-consistent with the per-drawer dry-run output. The storage metadata key names on the compressed collection (compression_ratio, original_tokens) stay the same for compatibility with anything already reading them. Only the source of the values is updated. Fixes MemPalace#159 (points 1 and 2)
1 parent 68e3414 commit c0afe5e

1 file changed

Lines changed: 11 additions & 12 deletions

File tree

mempalace/cli.py

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -291,16 +291,16 @@ def cmd_compress(args):
291291
)
292292
print()
293293

294-
total_original = 0
295-
total_compressed = 0
294+
total_orig_tokens = 0
295+
total_comp_tokens = 0
296296
compressed_entries = []
297297

298298
for doc, meta, doc_id in zip(docs, metas, ids):
299299
compressed = dialect.compress(doc, metadata=meta)
300300
stats = dialect.compression_stats(doc, compressed)
301301

302-
total_original += stats["original_chars"]
303-
total_compressed += stats["compressed_chars"]
302+
total_orig_tokens += stats["original_tokens_est"]
303+
total_comp_tokens += stats["summary_tokens_est"]
304304

305305
compressed_entries.append((doc_id, compressed, meta, stats))
306306

@@ -310,7 +310,8 @@ def cmd_compress(args):
310310
source = Path(meta.get("source_file", "?")).name
311311
print(f" [{wing_name}/{room_name}] {source}")
312312
print(
313-
f" {stats['original_tokens']}t -> {stats['compressed_tokens']}t ({stats['ratio']:.1f}x)"
313+
f" {stats['original_tokens_est']}t -> {stats['summary_tokens_est']}t "
314+
f"({stats['size_ratio']:.1f}x)"
314315
)
315316
print(f" {compressed}")
316317
print()
@@ -321,8 +322,8 @@ def cmd_compress(args):
321322
comp_col = client.get_or_create_collection("mempalace_compressed")
322323
for doc_id, compressed, meta, stats in compressed_entries:
323324
comp_meta = dict(meta)
324-
comp_meta["compression_ratio"] = round(stats["ratio"], 1)
325-
comp_meta["original_tokens"] = stats["original_tokens"]
325+
comp_meta["compression_ratio"] = round(stats["size_ratio"], 1)
326+
comp_meta["original_tokens"] = stats["original_tokens_est"]
326327
comp_col.upsert(
327328
ids=[doc_id],
328329
documents=[compressed],
@@ -335,11 +336,9 @@ def cmd_compress(args):
335336
print(f" Error storing compressed drawers: {e}")
336337
sys.exit(1)
337338

338-
# Summary
339-
ratio = total_original / max(total_compressed, 1)
340-
orig_tokens = Dialect.count_tokens("x" * total_original)
341-
comp_tokens = Dialect.count_tokens("x" * total_compressed)
342-
print(f" Total: {orig_tokens:,}t -> {comp_tokens:,}t ({ratio:.1f}x compression)")
339+
# Summary: token-based ratio stays consistent with the per-drawer line.
340+
ratio = total_orig_tokens / max(total_comp_tokens, 1)
341+
print(f" Total: {total_orig_tokens:,}t -> {total_comp_tokens:,}t ({ratio:.1f}x compression)")
343342
if args.dry_run:
344343
print(" (dry run -- nothing stored)")
345344

0 commit comments

Comments
 (0)