Skip to content

eisrael123/QAA

 
 

Repository files navigation

RNA-seq Quality Assessment and Alignment (QAA)

This repository contains a reproducible bioinformatics pipeline for the quality assessment, processing, and alignment of RNA-seq data derived from electric organ and skeletal muscle tissues of the electric fish Campylomormyrus compressirostris.

The workflow is designed to prepare raw sequencing data for downstream differential gene expression (DGE) analysis by performing rigorous quality control, adapter trimming, splice-aware alignment, and read quantification.

Dataset Information

The analysis processes raw RNA-seq reads from two specific libraries. The filenames have been standardized in this pipeline as follows:

Accession (SRA) Sample Name Description
SRR25630304 rhy49 C. compressirostris RNA-seq library (Project PRJNA1005245)
SRR25630399 rhy106 C. compressirostris RNA-seq library (Project PRJNA1005244)

Reference Genome:

Pipeline Workflow

The processing pipeline consists of five main stages:

  1. Quality Control (Pre-trim):
    • Tools: FastQC, Python (custom plotting).
    • Objective: Assess per-base sequence quality, N-content, and adapter contamination.
  2. Read Trimming:
    • Adapter Removal: Cutadapt is used to remove Illumina Universal Adapters.
    • Quality Trimming: Trimmomatic performs sliding-window trimming (window size 5, quality 15) and removes reads shorter than 35bp.
  3. Alignment:
    • Tool: STAR (Spliced Transcripts Alignment to a Reference).
    • Objective: Splice-aware alignment to the C. compressirostris reference genome.
  4. Deduplication:
    • Tool: Picard MarkDuplicates.
    • Objective: Identify and flag PCR duplicates to reduce technical bias.
  5. Quantification:
    • Tool: HTSeq-count.
    • Objective: Generate raw read counts for gene features (strandedness assessed during analysis).

Installation & Environment

All necessary software is managed via conda. To reproduce the analysis environment, create a new environment with the following dependencies:

conda create -n QAA fastqc=0.12.1 cutadapt=5.0 trimmomatic=0.39 star picard samtools numpy matplotlib htseq
conda activate QAA

About

Pipeline with necessary packages and scripts to successfully quality and read trim genetic reads.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 99.3%
  • Other 0.7%