This repository contains a reproducible bioinformatics pipeline for the quality assessment, processing, and alignment of RNA-seq data derived from electric organ and skeletal muscle tissues of the electric fish Campylomormyrus compressirostris.
The workflow is designed to prepare raw sequencing data for downstream differential gene expression (DGE) analysis by performing rigorous quality control, adapter trimming, splice-aware alignment, and read quantification.
The analysis processes raw RNA-seq reads from two specific libraries. The filenames have been standardized in this pipeline as follows:
| Accession (SRA) | Sample Name | Description |
|---|---|---|
SRR25630304 |
rhy49 | C. compressirostris RNA-seq library (Project PRJNA1005245) |
SRR25630399 |
rhy106 | C. compressirostris RNA-seq library (Project PRJNA1005244) |
Reference Genome:
- Campylomormyrus compressirostris genome assembly and annotation (GFF) were sourced from Dryad (DOI: 10.5061/dryad.c59zw3rcj).
The processing pipeline consists of five main stages:
- Quality Control (Pre-trim):
- Tools:
FastQC, Python (custom plotting). - Objective: Assess per-base sequence quality, N-content, and adapter contamination.
- Tools:
- Read Trimming:
- Adapter Removal:
Cutadaptis used to remove Illumina Universal Adapters. - Quality Trimming:
Trimmomaticperforms sliding-window trimming (window size 5, quality 15) and removes reads shorter than 35bp.
- Adapter Removal:
- Alignment:
- Tool:
STAR(Spliced Transcripts Alignment to a Reference). - Objective: Splice-aware alignment to the C. compressirostris reference genome.
- Tool:
- Deduplication:
- Tool:
Picard MarkDuplicates. - Objective: Identify and flag PCR duplicates to reduce technical bias.
- Tool:
- Quantification:
- Tool:
HTSeq-count. - Objective: Generate raw read counts for gene features (strandedness assessed during analysis).
- Tool:
All necessary software is managed via conda. To reproduce the analysis environment, create a new environment with the following dependencies:
conda create -n QAA fastqc=0.12.1 cutadapt=5.0 trimmomatic=0.39 star picard samtools numpy matplotlib htseq
conda activate QAA