Rust reimplementation of gapseq. For a detailed comparison with the original R/bash implementation, see COMPARISON.md.
gapsmith reconstructs genome-scale metabolic models from bacterial proteomes. Given a protein FASTA, it predicts metabolic pathways, detects transporters, assembles a draft stoichiometric model, infers a growth medium, and gap-fills the model so it can simulate growth.
gapsmith doall genome.faa.gz -f output/
This produces a gap-filled SBML model that loads directly in COBRApy, COBRAToolbox, or any SBML-compatible tool.
An external sequence aligner (pick one):
| Tool | Install |
|---|---|
| BLAST+ | apt install ncbi-blast+ or conda install -c bioconda blast |
| DIAMOND | apt install diamond or conda install -c bioconda diamond |
| MMseqs2 | apt install mmseqs2 or conda install -c bioconda mmseqs2 |
Plus a C++ toolchain (cmake, gcc/clang) for the bundled HiGHS LP
solver — any Linux / macOS system with dev-tools installed will do.
# Pick the right target for your OS/arch; see Releases page.
TARGET=x86_64-unknown-linux-gnu
VER=$(curl -s https://api.github.com/repos/bio-ontology-research-group/gapsmith/releases/latest \
| grep '"tag_name"' | head -1 | sed 's/.*"\(v[^"]*\)".*/\1/')
curl -L https://github.com/bio-ontology-research-group/gapsmith/releases/download/$VER/gapsmith-$VER-$TARGET.tar.gz | tar xz
cd gapsmith-$VER-$TARGET
./gapsmith --versionEach release tarball bundles the binary + curated data tables. See the releases page.
cargo install --git https://github.com/bio-ontology-research-group/gapsmith.git gapsmith-cliInstalls gapsmith into ~/.cargo/bin/. You still need the data/
curation tables — clone the repo or grab them from a release tarball.
git clone https://github.com/bio-ontology-research-group/gapsmith.git
cd gapsmith
cargo build --release
# Binary: target/release/gapsmith, curated data in ./data/Three parts, fetched independently:
-
Curation tables (subex, medium rules, biomass templates, …) — vendored in this repo under
data/. ~1 MB. Auto-used when running from a checkout; bundled inside release tarballs. -
Large public reference tables (SEED reactions + metabolites, MNXref cross-refs, ~65 MB) — fetched on demand from upstream gapseq's GitHub mirror:
gapsmith update-data -o path/to/dat
-
Sequence database (per-reaction FASTAs, ~2 GB) — downloaded from Zenodo on demand:
gapsmith update-sequences -D path/to/dat/seq -t Bacteria
After that you have a complete data directory and no longer need any
upstream gapseq checkout. Point all subsequent invocations at it with
--data-dir path/to/dat.
License-restricted data (MetaCyc pathways, KEGG, BiGG, BRENDA, VMH) is
left opt-in; a forthcoming --accept-license flag will gate loading
those.
# Full reconstruction pipeline (find → transport → draft → medium → fill)
gapsmith --data-dir path/to/dat doall genome.faa.gz -f output/ -A diamond
# Step by step
gapsmith --data-dir path/to/dat find -p all -A diamond -o output/ genome.faa
gapsmith --data-dir path/to/dat find-transport -A diamond -o output/ genome.faa
gapsmith --data-dir path/to/dat draft -r output/*-Reactions.tbl -t output/*-Transporter.tbl -o output/
gapsmith --data-dir path/to/dat medium -m output/*-draft.gmod.cbor -p output/*-Pathways.tbl
gapsmith --data-dir path/to/dat fill output/*-draft.gmod.cbor -n output/*-medium.csv -r output/*-Reactions.tbl -o output/| File | Contents |
|---|---|
*-all-Reactions.tbl |
Per-reaction homology hits + pathway context |
*-all-Pathways.tbl |
Pathway completeness predictions |
*-Transporter.tbl |
Detected transporters |
*-draft.gmod.cbor |
Draft model (native format) |
*-draft.xml |
Draft model (SBML L3V1 + FBC2 + groups) |
*-medium.csv |
Predicted growth medium |
*-filled.gmod.cbor |
Gap-filled model (native format) |
*-filled.xml |
Gap-filled model (SBML) |
*-filled-added.tsv |
Reactions added during gap-filling |
| Command | Description |
|---|---|
doall |
Full pipeline: find → transport → draft → medium → fill |
find |
Pathway and reaction detection |
find-transport |
Transporter detection |
draft |
Build a draft metabolic model |
medium |
Rule-based growth medium inference |
fill |
Iterative gap-filling (pFBA + KO essentiality) |
fba |
FBA / pFBA on an existing model |
adapt |
Add/remove reactions or force growth on compounds |
pan |
Build a pan-draft model from multiple drafts |
batch-align |
Cluster N genomes + single alignment + per-genome TSVs |
doall-batch |
Run doall across many genomes in parallel (rayon + SLURM-array --shard) |
community per-mag |
Per-MAG FBA under a shared (union) medium — scales to 1000+ MAGs |
community cfba |
Compose N drafts into one community model; weighted-sum biomass |
update-sequences |
Sync reference sequence database from Zenodo |
update-data |
Fetch the large public reference tables (SEED, MNXref) |
convert |
Convert between CBOR and JSON model formats |
export-sbml |
Export a model as SBML |
Run any command with -h for full option documentation.
Full documentation is published at https://bio-ontology-research-group.github.io/gapsmith/.
Local copies:
| Document | Contents |
|---|---|
| User guide | Install, quick-start, per-subcommand recipes, troubleshooting |
| CLI reference | Every flag of every subcommand |
| Multi-genome & metagenome workflows | gspa integration, doall-batch for 1k–1M genomes, community per-mag vs cfba |
| Architecture | Crate dependency graph, data flow, LP plumbing |
| Feature matrix | R source → Rust module mapping, status per feature |
| Porting notes | Intentional deviations from upstream gapseq |
| Performance | Shipped optimisations, benchmarks, semantic-parity results |
| Comparison | Performance benchmarks and feature comparison with upstream |
GPL-3.0-or-later — same as gapseq.
If you use gapsmith, please cite the original gapseq paper:
Zimmermann J, Kaleta C, Özbek Ö, et al. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biology 22, 81 (2021). https://doi.org/10.1186/s13059-021-02295-1