Describe the Question
Hi Eager Team, this isn't a bug so much as it is a question. I'm curious about why the qualimap and damageprofiler modules run on the output of samtools_filter rather than dedup? Perhaps it's a matter of personal preference, but I'd rather my coverage/depth estimates and allele frequency/substitution rates be calculated independent of PCR duplicates. For depth, it's to avoid overinflation of confidence and for damage calculation, it's to avoid duplicate molecules which are not independent allele observations. However, I'm not very familiar with either of these particular tools so any clarity you can provide is greatly appreciated!
Test Data
An ancient mitochondrial genome enrichment (low abundance, high duplication). By default, the pipeline output reports I have a ~1800x genome with very messy damage signatures. This sample was chosen because it's an extreme example to highlight the difference.
"Expected" behavior
Again, perhaps inexperience since I'm not familiar with these modules. I'd rather these run on the dedup output which reports that I have a ~18X genome with the expected terminal damage to match my library prep method.
To Reproduce
I attached my pipeline command here:
command.txt
Running revision: ace20a0 [dev]
DamageProfiler Comparison
Default EAGER Pipeline: DamagePlot_Default.pdf
Plotted from DeDup Output: DamagePlot_DeDup.pdf
Qualimap Comparison
Default EAGER Pipeline:

Plotted from DeDup Output:

Additional context
Runlog: nextflow.log
Thank you!
Describe the Question
Hi Eager Team, this isn't a bug so much as it is a question. I'm curious about why the qualimap and damageprofiler modules run on the output of samtools_filter rather than dedup? Perhaps it's a matter of personal preference, but I'd rather my coverage/depth estimates and allele frequency/substitution rates be calculated independent of PCR duplicates. For depth, it's to avoid overinflation of confidence and for damage calculation, it's to avoid duplicate molecules which are not independent allele observations. However, I'm not very familiar with either of these particular tools so any clarity you can provide is greatly appreciated!
Test Data
An ancient mitochondrial genome enrichment (low abundance, high duplication). By default, the pipeline output reports I have a ~1800x genome with very messy damage signatures. This sample was chosen because it's an extreme example to highlight the difference.
"Expected" behavior
Again, perhaps inexperience since I'm not familiar with these modules. I'd rather these run on the dedup output which reports that I have a ~18X genome with the expected terminal damage to match my library prep method.
To Reproduce
I attached my pipeline command here:
command.txt
Running revision: ace20a0 [dev]
DamageProfiler Comparison
Default EAGER Pipeline: DamagePlot_Default.pdf
Plotted from DeDup Output: DamagePlot_DeDup.pdf
Qualimap Comparison


Default EAGER Pipeline:
Plotted from DeDup Output:
Additional context
Runlog: nextflow.log
Thank you!