Skip to content

Commit 794dc81

Browse files
committed
test(dialect): strengthen compression_stats and count_tokens assertions
Adds coverage on top of the honest-stats test fix that already landed in main: - test_stats now also asserts original_tokens_est is strictly greater than summary_tokens_est, which catches a class of regressions where the token estimator flattens to a constant. - test_count_tokens gains edge cases for the empty string and the single-word input. Both exercise the max(1, ...) guard in Dialect.count_tokens, so a future refactor that drops the guard fails loudly instead of silently returning 0.
1 parent 71736a3 commit 794dc81

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

tests/test_dialect.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,14 @@ def test_stats(self):
111111
stats = d.compression_stats(original, compressed)
112112
assert stats["size_ratio"] > 1
113113
assert stats["original_chars"] > stats["summary_chars"]
114+
assert stats["original_tokens_est"] > stats["summary_tokens_est"]
114115

115116
def test_count_tokens(self):
117+
# count_tokens uses a word-based heuristic (~1.3 tokens per word).
118+
# "hello world" is 2 words -> max(1, int(2 * 1.3)) == 2.
116119
assert Dialect.count_tokens("hello world") == 2
120+
assert Dialect.count_tokens("") == 1
121+
assert Dialect.count_tokens("one") == 1
117122

118123

119124
class TestZettelEncoding:

0 commit comments

Comments
 (0)