We have tried normalizing the read counts previously but saw no big impact. Presumably this is because the read count distribution is exponential, leading to little impact created by sequencing depth as these don't typically vary by orders of magnitude for our samples.
The reads fractionate as:
- regular human mapped reads (hisat2)
- additional human mapped reads (bowtie2)
- reads mapped to human contigs (blast vs human)
- reads mapped to classifiable contigs
- reads mapped to unclassified contigs
- reads unmapped
The unmapped reads are presumably dominated by poor quality reads, as are the reads mapped to unclassified contigs. A good total amount of "total reads" would be all fractions except for that one. Alternatively, the typically dominant human mapped reads could be used exclusively as normalisation target. The question here is whether normalization then helps at all, as it would be done by DeSeq & friends prefer raw read counts, and would process a virus just as they would process any other expressed gene.
We have tried normalizing the read counts previously but saw no big impact. Presumably this is because the read count distribution is exponential, leading to little impact created by sequencing depth as these don't typically vary by orders of magnitude for our samples.
The reads fractionate as:
The unmapped reads are presumably dominated by poor quality reads, as are the reads mapped to unclassified contigs. A good total amount of "total reads" would be all fractions except for that one. Alternatively, the typically dominant human mapped reads could be used exclusively as normalisation target. The question here is whether normalization then helps at all, as it would be done by DeSeq & friends prefer raw read counts, and would process a virus just as they would process any other expressed gene.