Data for the quantitative study of (Vedic) Sanskrit
-
Updated
Mar 5, 2026 - Python
Data for the quantitative study of (Vedic) Sanskrit
Main application code for Ambuda, a breakthrough Sanskrit library (ambuda.org)
Code and data for "Summarising Historical Text in Modern Languages" (EACL 2021)
[PRL 2025, APSIPA 2022] Syllable Analysis Data Augmentation (SADA), This project introduces a glyph dictionary and grammar-aware augmentation strategy designed to enhance Khmer palm leaf manuscript recognition. By modeling the language's grammatical structure, we support more robust OCR performance in low-resource settings.
Raw dataset for Old Persian cuneiform
Official releases of the PROIEL treebank of ancient Indo-European languages
A tool for exploring the Linear A corpus
An Ancient Greek Morphology Tagger
Semantic Dictionaries for Ancient Languages
No-nonsense simple transliteration between writing systems, mostly of Semitic origin
Code and sample images described in the paper "DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning"
The Ancient Greek dictionary for Hunspell (grc_GR for Notepad++, Google Chrome, Vivaldi etc).
A metafont-glyphs dataset which facilitate people to define CJK-like glyphs with their metafont scripts by machine learning
Documentation for electronic Babylonian library (eBL) project
[SSDA 2023] This project explores advanced document image recognition methods tailored for low-resource historical German manuscripts.
An array of tools for Sanskrit for tasks such as noun declension and verb conjugation.
A program for creating a searchable local language dictionary based (mainly) on dumped wiktionary data. Allows user to collect definitions which can be exported as a machine readable flashcard file. Currently supports Latin, Ancient Greek and Old English.
Online decimal to maya numeral converter.
Contains a text fabric dataset of the Ugaritic corpus.
This is the Jekyll repository which holds the syllabus for the Ancient Language Processing course
Add a description, image, and links to the ancient-languages topic page so that developers can more easily learn about it.
To associate your repository with the ancient-languages topic, visit your repo's landing page and select "manage topics."