Skip to content

Commit 79e1647

Browse files
committed
test(dialect): update assertions for new honest-stats API
PR #147 renamed compression_stats fields (ratio -> size_ratio, compressed_chars -> summary_chars) and switched count_tokens to a word-based heuristic, but the test_dialect tests from PR #131 still assert the old API and fail on main. Bring TestCompressionStats.test_stats in line with the current dict keys (size_ratio, summary_chars, summary_tokens_est) and update test_count_tokens to match the word-based formula, with extra coverage for the empty and single-word edge cases around max(1, ...). This unblocks CI on main, which currently fails on these two tests.
1 parent 68e3414 commit 79e1647

1 file changed

Lines changed: 8 additions & 3 deletions

File tree

tests/test_dialect.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,11 +109,16 @@ def test_stats(self):
109109
original = "We decided to use GraphQL instead of REST. " * 10
110110
compressed = d.compress(original)
111111
stats = d.compression_stats(original, compressed)
112-
assert stats["ratio"] > 1
113-
assert stats["original_chars"] > stats["compressed_chars"]
112+
assert stats["size_ratio"] > 1
113+
assert stats["original_chars"] > stats["summary_chars"]
114+
assert stats["original_tokens_est"] > stats["summary_tokens_est"]
114115

115116
def test_count_tokens(self):
116-
assert Dialect.count_tokens("hello world") == len("hello world") // 3
117+
# count_tokens uses a word-based heuristic (~1.3 tokens per word).
118+
# "hello world" is 2 words -> max(1, int(2 * 1.3)) == 2.
119+
assert Dialect.count_tokens("hello world") == 2
120+
assert Dialect.count_tokens("") == 1
121+
assert Dialect.count_tokens("one") == 1
117122

118123

119124
class TestZettelEncoding:

0 commit comments

Comments
 (0)