State-of-the-art multilingual handwriting text recognition.
Thulium is a production-ready Python library for offline handwritten text recognition (HTR) supporting 52+ languages across Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean, and Georgian scripts.
- 52+ Languages — Comprehensive multilingual support with script-aware processing
- Production Ready — Optimized inference with ONNX export and mixed precision
- State-of-the-Art — CNN/ViT backbones with Transformer/LSTM sequence heads
- Explainable AI — Attention visualization, saliency maps, and confidence analysis
- Flexible Decoding — CTC beam search with n-gram and neural language models
pip install thulium-htrFor GPU acceleration:
pip install thulium-htr[gpu]from thulium import recognize_image
# Single image recognition
result = recognize_image("document.png", language="en")
print(result.text)
# Batch recognition with confidence scores
from thulium import HTRPipeline
pipeline = HTRPipeline.from_pretrained("thulium-base-multilingual")
results = pipeline.recognize_batch(images, languages=["en", "de", "fr"])
for r in results:
print(f"{r.text} (confidence: {r.confidence:.2%})")52+ languages across 10 scripts (click to expand)
| Region | Languages |
|---|---|
| Western Europe | English, German, French, Spanish, Italian, Portuguese, Dutch |
| Scandinavia | Swedish, Norwegian, Danish, Finnish, Icelandic |
| Eastern Europe | Polish, Czech, Hungarian, Romanian, Bulgarian, Ukrainian, Russian |
| Baltic | Lithuanian, Latvian, Estonian |
| Caucasus | Georgian, Armenian, Azerbaijani |
| Middle East | Arabic, Hebrew, Persian, Turkish |
| South Asia | Hindi, Bengali, Tamil, Telugu, Urdu |
| East Asia | Chinese, Japanese, Korean |
| Guide | Description |
|---|---|
| Getting Started | Installation and first steps |
| API Reference | Complete API documentation |
| Model Zoo | Pretrained model catalog |
| Training Guide | Train custom models |
| Architecture | System design overview |
Benchmarks on IAM Handwriting Database:
| Model | CER | WER | Latency |
|---|---|---|---|
| thulium-tiny | 5.2% | 14.1% | 12ms |
| thulium-base | 3.8% | 10.2% | 28ms |
| thulium-large | 2.9% | 7.8% | 65ms |
Measured on NVIDIA A100, batch size 1, PyTorch 2.0+
@software{thulium2025,
title={Thulium: Multilingual Handwriting Recognition},
author={Thulium Authors},
year={2025},
url={https://github.com/thulium-dev/thulium}
}We welcome contributions! See CONTRIBUTING.md for guidelines.
Apache 2.0 — see LICENSE for details.