🔥 BlazeSeq

High-Performance FASTX Parsing for Mojo — Zero-Copy to GPU

A high-throughput FASTQ parser written in Mojo. BlazeSeq targets several GB/s throughput from disk using zero-copy parsing, with owned records and GPU-friendly batching for read pipelines. It also supports streaming FASTA and samtools-style .fai index files (five- or six-column rows from faidx, index metadata only). Multithreaded gzip decompression uses rapidgzip (rapidgzip). Configurable validation is available — all through a single unified API.

✨ Key Features

SIMD-accelerated scanning — Vectorized from the ground up using mojo SIMD first-class support.
Three parsing modes — Choose your trade-off between speed and convenience:
- views() — Zero-copy views (fastest, borrow semantics)
- records() — Owned records (thread-safe)
- batches() — Structure-of-Arrays for GPU upload
Compile-time validation toggles — Enable/Disable ASCII/quality-range checks at compile time for maximum throughput
Rapidgzip with parallel decoding — Gzipped FASTQ (.fastq.gz) is decompressed in parallel across multiple threads for high throughput; tune with the parallelism.
FASTA and FAI — Streaming FASTA parsing and .fai index files; see the API reference for FastaParser and FaiParser.

Quick Start

Mojo package from repo (Pixi)

Use BlazeSeq as a Mojo dependency in your project. Install pixi first, then add BlazeSeq to your pixi.toml:

[dependencies]
blazeseq = { git = "https://github.com/MoSafi2/BlazeSeq", branch = "main" }

Then run pixi install and use the full Mojo API (e.g. FastqParser, FastaParser, FaiParser, views(), batches(), GPU batching).

Python bindings (experimental)

Python bindings are available via a wheel-only package on PyPI. They are experimental and may change. Install with pip install blazeseq or uv pip install blazeseq. Usage and API are documented in python/README.md.

🛠 Usage examples

# FastqParser with and without validation
pixi run mojo run examples/example_parser.mojo /path/to/file.fastq

# GPU needleman-wunsch global alignment (requires GPU)
pixi run mojo run examples/nw_gpu/main.mojo

Count reads and base pairs

from blazeseq import FastqParser, FileReader
from pathlib import Path

def main() raises:
    var parser = FastqParser(FileReader(Path("data.fastq")), "sanger")
    var reads = 0
    var bases = 0
    for record in parser.records():
        reads += 1
        bases += len(record)
    print(reads, bases)

Maximum speed (validation off)

from blazeseq import FastqParser, ParserConfig, FileReader
from pathlib import Path

def main() raises:
    comptime config = ParserConfig(check_ascii=False, check_quality=False)
    var parser = FastqParser[config=config](FileReader(Path("data.fastq")), "generic")
    for view in parser.views():   # zero-copy
        _ = len(view)

Batched (for GPU pipelines)

from blazeseq import FastqBatch
from gpu.host import DeviceContext

var ctx = DeviceContext()
var parser = FastqParser(FileReader(Path("data.fastq")), schema="generic", batch_size=4096)
for batch in parser.batches():
    # batch is a FastqBatch (Structure-of-Arrays)
    var device_batch = batch.to_device(ctx)   # GPU upload
    # Your GPU kernel, check examples

Reading gzip (rapidgzip, parallel decoding)

BlazeSeq uses RapidgzipReader for gzipped FASTQ. It performs parallel decompression: the compressed stream is split into chunks and multiple threads decode them concurrently resulting in much higher throughput than single-threaded readers through zlib or libdeflate .

from blazeseq import RapidgzipReader, FastqParser

var reader = RapidgzipReader("data.fastq.gz", parallelism=4)  # 0 = use all available cores.
var parser = FastqParser(reader^, "illumina_1.8")
for record in parser.records():
    _ = record.id()

Architecture & Trade-offs

Mode	Return Type	Copies Data?	Use When
`next_view()` / `views()`	`FastqView`	No	Streaming transforms (QC, filtering) where you process and discard. Not thread-safe
`next_record()` / `records()`	`FastqRecord`	Yes	Simple scripting, building in-memory collections
`next_batch()` / `batches()`	`FastqBatch` (SoA)	Yes	GPU pipelines, parallel CPU operations

Critical: FastqView spans are only valid until the next parser operation. Do not store them in collections or use after iteration advances.

Benchmarks

Throughput (file-based and in-memory) and comparison with needletail, seq_io, and kseq. See benchmark/README.md for commands and details.

Documentation

API Reference: https://mosafi2.github.io/BlazeSeq/
The site is generated with Modo (plain markdown from mojo doc output) and Astro Starlight.
Examples: examples/ directory includes parser usage, writer, and GPU alignment

Limitations

No multi-line FASTQ support — Records must fit four lines (standard Illumina/ONT format)
No current support for Paired-end reads (in progress)
No random seek within FASTQ/FASTA streams — sequence parsers are sequential; use MemoryReader for repeated scans. .fai index metadata is parsed separately with FaiParser.
Python package is wheel-only (no source build of the extension on install)

Testing

Run the test suite with pixi:

pixi run test

Tests use the same valid/invalid FASTQ corpus as BioJava, Biopython, and BioPerl FASTQ parsers. Multi-line FASTQ is not supported.

Project History

BlazeSeq is a ground-up rewrite of MojoFastTrim (archived MojoFastTrim), redesigned for:

Unified parser architecture (one parser, three modes)
GPU-oriented batch types
Compile-time configuration

Acknowledgements

The parsing algorithm is inspired by the parsing approach of rust-based needletail. It was further optimized to use first-class SIMD support in mojo.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 920 Commits
.github/workflows		.github/workflows
assets		assets
benchmark		benchmark
blazeseq		blazeseq
docs		docs
examples		examples
python		python
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
modo.yaml		modo.yaml
pixi.lock		pixi.lock
pixi.toml		pixi.toml
recipe.yaml		recipe.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 BlazeSeq

✨ Key Features

Quick Start

Mojo package from repo (Pixi)

Python bindings (experimental)

🛠 Usage examples

Count reads and base pairs

Maximum speed (validation off)

Batched (for GPU pipelines)

Reading gzip (rapidgzip, parallel decoding)

Architecture & Trade-offs

Benchmarks

Documentation

Limitations

Testing

Project History

Acknowledgements

License

About

Uh oh!

Releases 2

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔥 BlazeSeq

✨ Key Features

Quick Start

Mojo package from repo (Pixi)

Python bindings (experimental)

🛠 Usage examples

Count reads and base pairs

Maximum speed (validation off)

Batched (for GPU pipelines)

Reading gzip (rapidgzip, parallel decoding)

Architecture & Trade-offs

Benchmarks

Documentation

Limitations

Testing

Project History

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors

Uh oh!

Languages