Skip to content

iamamofa/HumanG_Fusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

39 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

HumanG_Fusion: RNA-Seq Fusion Detection Pipeline

Pipeline Overview

HumanG_Fusion is a modular, Nextflow-based pipeline for human RNA-Seq fusion detection, featuring:

  • Preprocessing (FastQC, trimming)
  • Optional Kraken2 decontamination
  • STAR alignment (2-pass)
  • Fusion calling (STAR-Fusion, Arriba, FusionCatcher)
  • Postprocessing & merging with gene annotation

๐Ÿ“ฆ Repository Layout

HumanG_Fusion/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ installation_setup/
โ”‚   โ”œโ”€โ”€ Docker_setup/
โ”‚   โ”œโ”€โ”€ bash_setup/
โ”‚   โ””โ”€โ”€ python_setup/
โ”œโ”€โ”€ docker/
โ”œโ”€โ”€ bin/
โ”œโ”€โ”€ conf/
โ”œโ”€โ”€ modules/
โ”œโ”€โ”€ scripts/
โ”œโ”€โ”€ bash_scripts/
โ”œโ”€โ”€ tools/
โ”œโ”€โ”€ sample_data/
โ”œโ”€โ”€ main.nf
โ”œโ”€โ”€ pipeline.config.sh
โ””โ”€โ”€ run_pipeline.sh

๐Ÿš€ Getting Started

1๏ธโƒฃ Installation Setup

Choose one of the following methods:


๐Ÿณ Docker

Best for: Reproducible, containerized execution.

Setup:

cd installation_setup/Docker_setup
# Follow README.md for build/run instructions
docker build -t humang_fusion:latest .

Or, use Docker Compose:

cd docker_compose_setup
docker compose build

๐Ÿ“ฆ Bash

Best for: Quick setup on Linux.

Setup:

cd installation_setup/bash_setup
chmod +x install.sh
./install.sh

This will:

  • Install required apt packages (if sudo)
  • Install Miniconda under ~/miniconda3 (if missing)
  • Create a bioinf environment via mamba
  • Install all conda + pip packages

Activate environment:

source ~/miniconda3/etc/profile.d/conda.sh
conda activate bioinf
# or, if fallback venv was used:
source ~/pyenv-bioinf/bin/activate

๐Ÿ Python

Best for: Python-centric environments.

Setup:

cd installation_setup/python_setup
chmod +x install.py
./install.py

This will:

  • Install Miniconda (if missing)
  • Create a bioinf environment
  • Install all dependencies

Activate environment:

source ~/miniconda3/etc/profile.d/conda.sh
conda activate bioinf

2๏ธโƒฃ Data Preparation

Place your FASTQ files in sample_data/:

sample_data/
โ”œโ”€โ”€ sample1_R1.fastq.gz
โ”œโ”€โ”€ sample1_R2.fastq.gz
โ”œโ”€โ”€ sample2_R1.fastq.gz
โ””โ”€โ”€ sample2_R2.fastq.gz


2.1 Reference Preparation

Please Visit โžก๏ธ References


3๏ธโƒฃ Pipeline Configuration

Edit pipeline.config.sh with your paths:

#!/bin/bash
export READS="./sample_data/*_R1.fastq.gz"
export OUTDIR="./results"
export STAR_INDEX="/refs/STAR_index_GRCh38"
export GENOME_FASTA="/refs/GRCh38.fa"
export GTF="/refs/gencode.v41.annotation.gtf"
export STAR_FUSION_CTAT_LIB="/refs/ctat_resource_lib"
export KRAKEN2_DB="/refs/kraken2_db"
export THREADS=8
export KRAKEN_DECONTAM=true
export RUN_FUSIONCATCHER=false

4๏ธโƒฃ Run the Pipeline

๐Ÿƒ Automated (Recommended)

chmod +x run_pipeline.sh
./run_pipeline.sh

This script will:

  • Source your config
  • Run Nextflow with the correct profile (Docker, Conda, or Bash)
  • Execute all steps in order

๐Ÿ›  Manual Step-by-Step

If you prefer, you can run each step manually using the scripts in bash_scripts/ or scripts/.


๐Ÿ”ง Pipeline Modules

Module Description
preprocess FastQC, trimming (cutadapt)
kraken_decontam Optional: Remove non-human reads
star_align STAR 2-pass alignment
fusion_callers STAR-Fusion, Arriba, FusionCatcher (optional)
postprocess Merge calls, annotate genes, produce summary TSV

๐Ÿ“‚ Outputs

  • results/{sample}/fastq_trimmed/: Trimmed FASTQ
  • results/{sample}/align/: BAM files
  • results/{sample}/fusion/: Caller-specific outputs
  • results/reports/: Merged, annotated fusion calls

๐Ÿงช Testing & Validation

  • Test on small datasets with known fusions.
  • Validate artefacts in IGV.

๐Ÿ“ Notes

  • STAR-Fusion CTAT resource library must be provided.
  • Kraken2 DB required if using decontamination.
  • mygene is used for gene annotation.

๐Ÿš€ Next Steps

  • Add Singularity/Docker push automation
  • Add Nextflow Tower profiles
  • Create a tests/ folder with test FASTQs

๐Ÿ’ฌ Questions?

Open an issue or contact the maintainers!

Justiceoheneamofa@gmail.com / kdanquah@atu.edu.gh / kdanquah@noguchi.ug.edu.gh

About

HumanG_Fusion is a Nextflow DSL2 pipeline for human RNA-Seq fusion detection. It supports preprocessing, optional Kraken2 decontamination, STAR alignment, and fusion callers (STAR-Fusion, Arriba, FusionCatcher). Results are merged and annotated with Ensembl gene data for reproducible, high-confidence analysis.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

โšก