Skip to content

Endogenous DNA (%) calculation #522

@stschiff

Description

@stschiff

When calculating the Endogenous DNA (%) as reported in the "General Statistics" Table in the MultiQC Output, Eager divides the number of mapped over total reads after AdapterRemoval. AdapterRemoval also performs size-filtering, however. This means that the ratio of mapped over "total" reads only uses a subset of all reads, leading to an inflated estimate.

In a recent test we did, we had a very poorly preserved sample, where out of >5 million reads sequenced, only 150K passed AdapterRemoval, mostly because reads were very short. The outcome was that Eager reported an Endogenous DNA % of 5%, which is quite good. However, when divided over the actual raw number of reads sequenced, the result would be 0.144%, which is very poor and just good enough for performing in-solution SNP-Capture.

Long story short: I think we should think a bit what definition of "Endogenous DNA (%)" is most reasonable. I would argue we should go by the principle of "Least Surprise". And in this case, a number of 5% was certainly very surprising to me, given the poor preservation of that sample, and I would have preferred to have been reported 0.144%, i.e. the ratio of mapped length-filtered reads divided over total sequenced reads. This definition makes economically most sense, as it is literally the fraction of sequenced reads that can be used for analysis as mapped reads. A counter-argument would be that the total number of sequenced reads might include artefacts from the library building, such as Adapter-Dimers or so, and hence the out coming estimate wouldn't really be the "biological" estimate of Endogenous DNA. However, I think that argument doesn't really count, since "artefacts" like Adapter-Dimers are real, and should justifiably reduce the Endogenous DNA of that library, just as - say - contamination by bacteria or other environmental DNA would decrease the Endogenous DNA.

So I would vote for changing the definition of "Endogenous DNA (%)" to the more conservative ratio of

mapped reads (post-AdapterRemoval) / total reads (pre-AdapterRemoval)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions