Skip to content

Module Order of DeDup Qualimap DamageProfiler #227

@ktmeaton

Description

@ktmeaton

Describe the Question
Hi Eager Team, this isn't a bug so much as it is a question. I'm curious about why the qualimap and damageprofiler modules run on the output of samtools_filter rather than dedup? Perhaps it's a matter of personal preference, but I'd rather my coverage/depth estimates and allele frequency/substitution rates be calculated independent of PCR duplicates. For depth, it's to avoid overinflation of confidence and for damage calculation, it's to avoid duplicate molecules which are not independent allele observations. However, I'm not very familiar with either of these particular tools so any clarity you can provide is greatly appreciated!

Test Data
An ancient mitochondrial genome enrichment (low abundance, high duplication). By default, the pipeline output reports I have a ~1800x genome with very messy damage signatures. This sample was chosen because it's an extreme example to highlight the difference.

"Expected" behavior
Again, perhaps inexperience since I'm not familiar with these modules. I'd rather these run on the dedup output which reports that I have a ~18X genome with the expected terminal damage to match my library prep method.

To Reproduce
I attached my pipeline command here:
command.txt
Running revision: ace20a0 [dev]

DamageProfiler Comparison
Default EAGER Pipeline: DamagePlot_Default.pdf
Plotted from DeDup Output: DamagePlot_DeDup.pdf

Qualimap Comparison
Default EAGER Pipeline:
Qualimap_Default
Plotted from DeDup Output:
Qualimap_DeDup

Additional context
Runlog: nextflow.log

Thank you!

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions