gapsmith

Rust reimplementation of gapseq. For a detailed comparison with the original R/bash implementation, see COMPARISON.md.

What it does

gapsmith reconstructs genome-scale metabolic models from bacterial proteomes. Given a protein FASTA, it predicts metabolic pathways, detects transporters, assembles a draft stoichiometric model, infers a growth medium, and gap-fills the model so it can simulate growth.

gapsmith doall genome.faa.gz -f output/

This produces a gap-filled SBML model that loads directly in COBRApy, COBRAToolbox, or any SBML-compatible tool.

Install

Prerequisites

An external sequence aligner (pick one):

Tool	Install
BLAST+	`apt install ncbi-blast+` or `conda install -c bioconda blast`
DIAMOND	`apt install diamond` or `conda install -c bioconda diamond`
MMseqs2	`apt install mmseqs2` or `conda install -c bioconda mmseqs2`

Plus a C++ toolchain (cmake, gcc/clang) for the bundled HiGHS LP solver — any Linux / macOS system with dev-tools installed will do.

Option 1: pre-built binary (fastest)

# Pick the right target for your OS/arch; see Releases page.
TARGET=x86_64-unknown-linux-gnu
VER=$(curl -s https://api.github.com/repos/bio-ontology-research-group/gapsmith/releases/latest \
  | grep '"tag_name"' | head -1 | sed 's/.*"\(v[^"]*\)".*/\1/')
curl -L https://github.com/bio-ontology-research-group/gapsmith/releases/download/$VER/gapsmith-$VER-$TARGET.tar.gz | tar xz
cd gapsmith-$VER-$TARGET
./gapsmith --version

Each release tarball bundles the binary + curated data tables. See the releases page.

Option 2: cargo install (directly from git)

cargo install --git https://github.com/bio-ontology-research-group/gapsmith.git gapsmith-cli

Installs gapsmith into ~/.cargo/bin/. You still need the data/ curation tables — clone the repo or grab them from a release tarball.

Option 3: build from source

git clone https://github.com/bio-ontology-research-group/gapsmith.git
cd gapsmith
cargo build --release
# Binary: target/release/gapsmith, curated data in ./data/

Reference data

Three parts, fetched independently:

Curation tables (subex, medium rules, biomass templates, …) — vendored in this repo under data/. ~1 MB. Auto-used when running from a checkout; bundled inside release tarballs.
Large public reference tables (SEED reactions + metabolites, MNXref cross-refs, ~65 MB) — fetched on demand from upstream gapseq's GitHub mirror:
```
gapsmith update-data -o path/to/dat
```
Sequence database (per-reaction FASTAs, ~2 GB) — downloaded from Zenodo on demand:
```
gapsmith update-sequences -D path/to/dat/seq -t Bacteria
```

After that you have a complete data directory and no longer need any upstream gapseq checkout. Point all subsequent invocations at it with --data-dir path/to/dat.

License-restricted data (MetaCyc pathways, KEGG, BiGG, BRENDA, VMH) is left opt-in; a forthcoming --accept-license flag will gate loading those.

Quick start

# Full reconstruction pipeline (find → transport → draft → medium → fill)
gapsmith --data-dir path/to/dat doall genome.faa.gz -f output/ -A diamond

# Step by step
gapsmith --data-dir path/to/dat find -p all -A diamond -o output/ genome.faa
gapsmith --data-dir path/to/dat find-transport -A diamond -o output/ genome.faa
gapsmith --data-dir path/to/dat draft -r output/*-Reactions.tbl -t output/*-Transporter.tbl -o output/
gapsmith --data-dir path/to/dat medium -m output/*-draft.gmod.cbor -p output/*-Pathways.tbl
gapsmith --data-dir path/to/dat fill output/*-draft.gmod.cbor -n output/*-medium.csv -r output/*-Reactions.tbl -o output/

Output files

File	Contents
`*-all-Reactions.tbl`	Per-reaction homology hits + pathway context
`*-all-Pathways.tbl`	Pathway completeness predictions
`*-Transporter.tbl`	Detected transporters
`*-draft.gmod.cbor`	Draft model (native format)
`*-draft.xml`	Draft model (SBML L3V1 + FBC2 + groups)
`*-medium.csv`	Predicted growth medium
`*-filled.gmod.cbor`	Gap-filled model (native format)
`*-filled.xml`	Gap-filled model (SBML)
`*-filled-added.tsv`	Reactions added during gap-filling

Subcommands

Command	Description
`doall`	Full pipeline: find → transport → draft → medium → fill
`find`	Pathway and reaction detection
`find-transport`	Transporter detection
`draft`	Build a draft metabolic model
`medium`	Rule-based growth medium inference
`fill`	Iterative gap-filling (pFBA + KO essentiality)
`fba`	FBA / pFBA on an existing model
`adapt`	Add/remove reactions or force growth on compounds
`pan`	Build a pan-draft model from multiple drafts
`batch-align`	Cluster N genomes + single alignment + per-genome TSVs
`doall-batch`	Run `doall` across many genomes in parallel (rayon + SLURM-array `--shard`)
`community per-mag`	Per-MAG FBA under a shared (union) medium — scales to 1000+ MAGs
`community cfba`	Compose N drafts into one community model; weighted-sum biomass
`update-sequences`	Sync reference sequence database from Zenodo
`update-data`	Fetch the large public reference tables (SEED, MNXref)
`convert`	Convert between CBOR and JSON model formats
`export-sbml`	Export a model as SBML

Run any command with -h for full option documentation.

Documentation

Full documentation is published at https://bio-ontology-research-group.github.io/gapsmith/.

Local copies:

Document	Contents
User guide	Install, quick-start, per-subcommand recipes, troubleshooting
CLI reference	Every flag of every subcommand
Multi-genome & metagenome workflows	gspa integration, `doall-batch` for 1k–1M genomes, community `per-mag` vs `cfba`
Architecture	Crate dependency graph, data flow, LP plumbing
Feature matrix	R source → Rust module mapping, status per feature
Porting notes	Intentional deviations from upstream gapseq
Performance	Shipped optimisations, benchmarks, semantic-parity results
Comparison	Performance benchmarks and feature comparison with upstream

License

GPL-3.0-or-later — same as gapseq.

Citation

If you use gapsmith, please cite the original gapseq paper:

Zimmermann J, Kaleta C, Özbek Ö, et al. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biology 22, 81 (2021). https://doi.org/10.1186/s13059-021-02295-1

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
crates		crates
data		data
docs		docs
tools		tools
.gitignore		.gitignore
COMPARISON.md		COMPARISON.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PUBLISHING.md		PUBLISHING.md
README.md		README.md
book.toml		book.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gapsmith

What it does

Install

Prerequisites

Option 1: pre-built binary (fastest)

Option 2: cargo install (directly from git)

Option 3: build from source

Reference data

Quick start

Output files

Subcommands

Documentation

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gapsmith

What it does

Install

Prerequisites

Option 1: pre-built binary (fastest)

Option 2: cargo install (directly from git)

Option 3: build from source

Reference data

Quick start

Output files

Subcommands

Documentation

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages