TEI-reader

Python 3 Library for Reading the Text Content and Metadata of TEI P5 (Lite) Files

The library focuses on extracting the main text content from a file and providing the available metadata about the text.

It was originally created to support importing TEI formatted corpora into GrETEL, using the corpus2alpino library.

TL;DR

pip install tei-reader

from tei_reader import TeiReader
reader = TeiReader()
corpora = reader.read_file('example-tei.xml') # or read_string
print(corpora.text)

# show element attributes before the actual element text
print(corpora.tostring(lambda x, text: str(list(a.key + '=' + a.text for a in x.attributes)) + text))

More Explanation

A reader can be opened using TeiReader(). It is then possible to either call read_file(file_name) or read_string(str). Both will return a Corpora object containing the following properties:

Property	Description
`corpora[]`	A corpora can contain sub-corpora.
`documents[]`	The `Document` objects directly part of this corpora.

Corpora and Document all inherit from Element. In all objects deriving from this it is possible to call:

Property	Description
`attributes{}`	Contain attributes applicable to this element. If an attribute contains attributes these are also returned. (e.g. `encodingDesc::editorialDecl::normalization`)
`text`	Get the entire text content as `str`
`divisions[]`	Recursively get all the text divisions in document order. If an element contains parts or text without tag. Those will be returned in order and wrapped with a `PlaceholderDivision`.
`all_parts[]`	Recursively get the parts in document order constituting the entire text e.g. if something has emphasis, a footnote or is marked as foreign. Text without a container element will be returned in order and wrapped with a `PlaceholderPart`.
`parts[]`	Get the parts in document order directly below the current element.

Attribute, PlaceholderDivision and PlaceholderPart all support the same properties as Element.

Upload to PyPi

python setup.py sdist
twine upload dist/*

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
tei_reader		tei_reader
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.py		setup.py
tei.tar		tei.tar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEI-reader

TL;DR

More Explanation

Upload to PyPi

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TEI-reader

TL;DR

More Explanation

Upload to PyPi

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages