Skip to content

cogitoergo-sum/markdown_extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Markdown Extract

A Simple Python library to parse Markdown files from headers.

PyPI Latest Release

Installation

Install via pip:

pip install markdown-extract

Usage

from markdown_extract import MarkdownExtractor

markdown_content = """
# Section 1
Some content here.

## Subsection 1.1
More details.
"""

extractor = MarkdownExtractor(markdown_content)

# 1. Access sections using dictionary-style brackets
print(extractor["Section 1"])
# Output:
# # Section 1
# Some content here.
# ...

# 2. Access nested sections
print(extractor["Section 1"]["Subsection 1.1"])
# Output:
# ## Subsection 1.1
# More details.

# 3. List child headers
print(extractor.list())
# Output: ['Section 1']

print(extractor["Section 1"].list())
# Output: ['Subsection 1.1']

# 4. Access the full document (root)
print(extractor[""])

Features

  • Nested Parsing: Correctly parses Markdown headers into a nested structure.
  • Robust Extraction: Ignores "headers" that are actually inside:
    • Code blocks (```)
    • Tables
    • Math blocks ($$)
    • YAML front matter (---)
  • Indentation Support: Handles indented headers correctly.
  • Easy Access: Use bracket notation (extractor["Header"]) or .get_section() method.
  • Discovery: Use .list() to see available child headers at any level.

Development

To run the tests:

python tests/run_tests.py

About

A Simple Python library to parse Markdown files from headers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages