Skip to content

FelixKrueger/TrimGalore

Repository files navigation

Trim Galore

Consistent quality and adapter trimming for next-generation sequencing data, with special handling for RRBS libraries.

CI Crates.io install with bioconda

Note

Trim Galore v2.0 is a faithful Rust rewrite — a single binary with zero external dependencies, designed as a drop-in replacement for v0.6.x scripts and pipelines. Same CLI, same output filenames, same report format. Adds poly-G auto-detection and trimming for 2-colour instruments, a generic poly-A trimmer, per-pair adapter auto-detection, and cleaner multi-adapter invocation (repeatable -a/-a2 instead of Perl's embedded-string syntax) — among other extensions. For details on what changed, benchmarks, and migration notes, see the v2.0 migration notes.

Features

  • Adapter auto-detection — automatically identifies Illumina, Nextera, Small RNA, and BGI/DNBSEQ adapters from the first 1M reads. Stranded Illumina remains explicit (--stranded_illumina) because its sequence is ambiguous with Nextera.
  • Multi-adapter support — specify multiple adapters by repeating -a/-a2 or via -a "file:adapters.fa", with optional multi-round trimming (-n)
  • Quality trimming — Phred-based trimming from the 3' end (BWA algorithm)
  • Paired-end — single-pass processing of both reads with automatic pair validation
  • RRBS — MspI end-repair artifact removal, directional and non-directional libraries
  • Poly-G trimming — sequence-based removal of no-signal G-runs at the 3' end of Read 1 (and poly-C at the 5' end of Read 2) from 2-colour instruments (NovaSeq, NextSeq, NovaSeq X). Auto-detected from the data; opt-out with --no_poly_g
  • NextSeq / 2-colour quality trim--nextseq N / --2colour N applies 2-colour-aware quality trimming (opt-in; replaces -q)
  • Poly-A trimming — built-in removal of poly-A tails without external tools; recommended for mRNA-seq / poly-A-selected RNA-seq libraries
  • Parallel processing--cores N runs trimming and gzip compression in worker threads under an N+4 thread model (N workers + 2 decompressors + 1 batcher + 1 writer); near-linear speedup up to --cores 8 for paired-end runs, then gzip-output I/O typically becomes binding
  • Clumpify compression (v2.2+) — opt-in --clumpify reorders reads by canonical 16-mer minimizer so similar reads share gzip dictionary windows; combined with --compression 1–9 it shrinks output 15–55% on fragment-clustered data (ATAC, Ribo, RRBS, RNA-seq, MiSeq amplicons). Coverage-diverse data (WGBS PE, scRNA-seq R2) regress — see Clumpy compression for the per-data-type guidance
  • FastQC integration — optional post-trimming quality reports built in via the bundled fastqc-rust library; produces FastQC 0.12.1-compatible HTML + ZIP outputs without requiring Java or an external fastqc on $PATH
  • MultiQC compatible — trimming reports parse cleanly in MultiQC dashboards (text + JSON)
  • Demultiplexing — 3' inline barcode demultiplexing

Installation

From crates.io

Requires the Rust toolchain (1.88+):

cargo install trim-galore

From bioconda

conda install -c bioconda trim-galore

Build from source

git clone https://github.com/FelixKrueger/TrimGalore.git
cd TrimGalore
cargo build --release
# Binary is at target/release/trim_galore

Latest development version

To install the latest unreleased changes directly from the development branch:

cargo install --git https://github.com/FelixKrueger/TrimGalore --branch dev trim-galore --force

The --force flag overwrites any existing trim_galore binary (e.g. a v2.0.0 install from crates.io).

Docker

Multi-arch images (amd64 + arm64) are available from GitHub Container Registry:

docker run --rm -v "$PWD":/data -w /data ghcr.io/felixkrueger/trimgalore:latest trim_galore input.fastq.gz

FastQC is built into the binary itself via the bundled fastqc-rust library — no external fastqc or Java runtime needed in the image. Tags published: :latest (latest stable, currently v2.2.0), :v2.2.0 (pinned to a specific release), :beta (latest prerelease — only set during an active beta cycle), and :dev (every push to the dev branch). See the docs site install page for the full table.

Prebuilt binaries

Prebuilt binaries for Linux (x86_64, aarch64) and macOS (Apple Silicon) are available on the Releases page. Intel Mac users: install via cargo install trim-galore (local build) or use the Docker amd64 image.

Usage

# Single-end
trim_galore input.fastq.gz

# Paired-end
trim_galore --paired file_R1.fastq.gz file_R2.fastq.gz

# Parallel processing (recommended for large files)
# Near-linear speedup up to ~8 cores on v2.2.0; beyond that the
# gzip-output I/O on the storage layer typically becomes binding.
trim_galore --cores 8 --paired file_R1.fastq.gz file_R2.fastq.gz

# RRBS mode
trim_galore --rrbs --paired file_R1.fastq.gz file_R2.fastq.gz

# Run FastQC on trimmed output
trim_galore --fastqc input.fastq.gz

For the complete list of options:

trim_galore --help

Output files

Mode Trimmed output Reports
Single-end *_trimmed.fq.gz *_trimming_report.txt + *_trimming_report.json
Paired-end *_val_1.fq.gz / *_val_2.fq.gz per-read text + JSON reports
Unpaired (with --retain_unpaired) *_unpaired_1.fq.gz / *_unpaired_2.fq.gz

Output compression mirrors the input: gzipped input (*.fastq.gz) produces gzipped output (*.fq.gz); plain input (*.fastq) produces plain output (*.fq). Pass --dont_gzip to force plain output regardless. By default, gzip output is written at compression level 1 (fastest); pass --compression <N> (1–9) to override — decompressed content is byte-identical regardless of level, but level-1 .fq.gz files are roughly 75% larger than level-9 in exchange for substantially faster trimming.

Pass --clumpify to reorder reads inside each gzip member by canonical 16-mer minimizer so reads sharing similar sequence land adjacent on disk, letting gzip's 32 KB dictionary find longer back-references. The right configuration depends on what the trimmed FASTQ is used for:

# Pipeline intermediates (deleted after the run): reorder is essentially free
# and shrinks the file 15–35% on most data — net I/O win for the next step.
trim_galore --clumpify <input>

# Long-term storage / disk-constrained workdirs: add gzip L6 for 15–50% saving
# at 4–6× plain wall.
trim_galore --clumpify --compression 6 <input>

# Archival use, max compression
trim_galore --clumpify --compression 9 <input>

# Smaller output without the reorder cost (e.g. for 10x scRNA-seq, see docs)
trim_galore --compression 6 <input>

No information loss — only the on-disk order of records changes. Output records are byte-identical to the unsorted output and trimming reports are unaffected. --clumpify requires --cores >= 2.

Memory budget is controlled by the global --memory flag (default 1G); bigger budgets give bigger per-gzip-member sort runs and better compression up to roughly the uncompressed input size, with sharply diminishing returns above ~2 GB. With enough memory, --clumpify --compression 9 gets you within 1–2 percentage points of bbmap clumpify and stevekm/squish on the same data.

Intended for short reads (Illumina, AVITI). Long-read inputs (Oxford Nanopore, PacBio) and 10x scRNA-seq typically see no benefit or a small negative result; see Clumpy compression for per-data-type recommendations.

The JSON report contains the same statistics as the text report in a structured format (schema v1), designed for native parsing by MultiQC.

Documentation

Full documentation is published at https://www.trimgalore.com/

Credits

Trim Galore was developed at The Babraham Institute by @FelixKrueger, now part of Altos Labs.

License

GPL-3.0

About

Consistent adapter and quality trimming for NGS, with extra functionality for RRBS data

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages