125 lines (89 loc) · 3.5 KB

Changelog

This is the changelog for the open source version of tiktoken.

[v0.12.0]

Build wheels for Python 3.14
Build musllinux aarch64 wheels
Support for free-threaded Python
Update version of pyo3 and rustc-hash
Avoid use of blobfile for reading local files
Recognise gpt-5 model identifier
Minor performance improvement for file reading

[v0.11.0]

Support for GPT-5
Update version of pyo3
Use new Rust edition
Fix special token handling in encode_to_numpy
Better error handling
Improvements to private APIs

[v0.10.0]

Support for newer models
Improvements to private APIs

[v0.9.0]

Support for o1 and o3 models
Better error messages when loading invalid vocabulary files
Support for encoding to numpy arrays
Delayed imports when not strictly necessary

[v0.8.0]

Support for o1- and chatgpt-4o- models
Build wheels for Python 3.13
Add possessive quantifiers to limit backtracking in regular expressions, thanks to @l0rinc!
Provide a better error message and type for invalid token decode
Permit tuples in type hints
Better error message for passing invalid input to get_encoding
Better error messages during plugin loading
Add a __version__ attribute
Update versions of pyo3, regex, fancy-regex
Drop support for Python 3.8

[v0.7.0]

Support for gpt-4o
Performance improvements

[v0.6.0]

Optimise regular expressions for a 20% performance improvement, thanks to @paplorinc!
Add text-embedding-3-* models to encoding_for_model
Check content hash for downloaded files
Allow pickling Encoding objects. Registered Encoding will be pickled by reference
Workaround PyO3 bug for frozenset conversion

Thank you to @paplorinc, @mdwelsh, @Praneet460!

[v0.5.2]

Build wheels for Python 3.12
Update version of PyO3 to allow multiple imports
Avoid permission errors when using default cache logic

[v0.5.1]

Add encoding_name_for_model, undo some renames to variables that are implementation details

[v0.5.0]

Add tiktoken._educational submodule to better document how byte pair encoding works
Ensure encoding_for_model knows about several new models
Add decode_with_offets
Better error for failures with the plugin mechanism
Make more tests public
Update versions of dependencies

[v0.4.0]

Add decode_batch and decode_bytes_batch
Improve error messages and handling

[v0.3.3]

tiktoken will now make a best effort attempt to replace surrogate pairs with the corresponding Unicode character and will replace lone surrogates with the Unicode replacement character.

[v0.3.2]

Add encoding for GPT-4

[v0.3.1]

Build aarch64 wheels
Make blobfile an optional dependency

Thank you to @messense for the environment variable that makes cargo not OOM under emulation!

[v0.3.0]

Improve performance by 5-20%; thank you to @nistath!
Add gpt-3.5-turbo models to encoding_for_model
Add prefix matching to encoding_for_model to better support future model versions
Fix a bug in the README instructions on extending tiktoken
Update the set of available encodings
Add packaging metadata

[v0.2.0]

Add tiktoken.encoding_for_model to get the encoding for a specific model
Improve portability of caching logic

Thank you to @fritzo, @arvid220u, @khanhvu207, @henriktorget for various small corrections

[v0.1.2]

Avoid use of blobfile for public files
Add support for Python 3.8
Add py.typed
Improve the public tests

[v0.1.1]

Initial release

⚡