You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/output.md
+17-1Lines changed: 17 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,6 +108,8 @@ For other non-default columns, hover over the column name for further descriptio
108
108
109
109
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your raw reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C) as sequenced. You also get information about adapter contamination and other overrepresented sequences.
110
110
111
+
You will receive output for each supplied FASTQ file.
112
+
111
113
When dealing with ancient DNA data the MultiQC plots for FastQC will often show lots of 'warning' or 'failed' samples. You generally can discard this sort of information as we are dealing with very degraded and metagenomic samples which have artefacts that violate the FastQC 'quality definitions', while still being valid data for aDNA researchers. Instead you will _normally_ be looking for 'global' patterns across all samples of a sequencing run to check for library construction or sequencing failures. Decision on whether a individual sample has 'failed' or not should be made by the user after checking all the plots themselves (e.g. if the sample is consistently an outlier to all others in the run).
112
114
113
115
For further reading and documentation see the [FastQC help](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
@@ -243,6 +245,8 @@ In the case of dual-indexed paired-end sequencing, it is likely poly-G tails are
243
245
244
246
While the MultiQC report has multiple plots for FastP, we will only look at GC content as that's the functionality we use currently.
245
247
248
+
You will receive output for each supplied FASTQ file.
249
+
246
250
#### GC Content
247
251
248
252
This line plot shows the average GC content (Y axis) across each nucleotide of the reads (X-axis). There are two buttons per read (i.e. 2 for single-end, and 4 for paired-end) representing before and after the poly-G tail trimming.
@@ -274,6 +278,8 @@ Quality trimming (or 'truncating') involves looking at ends of reads for low-con
274
278
275
279
Length filtering involves removing any read that does not reach the number of bases specified by a particular value.
276
280
281
+
You will receive output for each FASTQ file supplied for single end data, or for each pair of merged FASTQ files for paired end data.
282
+
277
283
#### Retained and Discarded Reads Plot
278
284
279
285
These stacked bars plots are unfortunately a little confusing, when displayed in MultiQC. However are relatively straight-forward once you understand each category. They can be displayed as counts of reads per AdapterRemoval read-category, or as percentages of the same values. Each forward(/reverse) file combination are displayed once.
@@ -317,6 +323,8 @@ With paired-end ancient DNA sequencing runs You expect to see a slight increase
317
323
318
324
This module provides numbers in raw counts of the mapping of your DNA reads to your reference genome.
319
325
326
+
You will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes merging, you will get mapping statistics of all lanes in one value.
327
+
320
328
#### Flagstat Plot
321
329
322
330
This dot plot shows different statistics, and the number of reads (typically as an multiple e.g. million, or thousands), are represented by dots on the X axis.
@@ -335,6 +343,8 @@ The remaining rows will be 0 when running `bwa aln` as these characteristucs of
335
343
336
344
### DeDup
337
345
346
+
You will receive output for each _library_. This means that if you use TSV input and have one library sequenced over multiple lanes merging, you will get mapping statistics of all lanes of the library in one value.
347
+
338
348
#### Background
339
349
340
350
DeDup is a duplicate removal tool which searches for PCR duplicates and removes them from your BAM file. We remove these duplicates because otherwise you would be artificially increasing your coverage and subsequently confidence in genotyping, by considering these lab artefacts which are not biologically meaningful. DeDup looks for reads with the same start and end coordinates, and whether they have exactly the same sequence. The main difference of DeDup versus e.g. `samtools markduplicates` is that DeDup considers _both_ ends of a read, not just the start position, so it is more precise in removing actual duplicates without penalising often already low aDNA data.
@@ -364,6 +374,8 @@ Things to look out for:
364
374
365
375
### Preseq
366
376
377
+
You will receive output for each deduplicated _library_. This means that if you use TSV input and have one library sequenced over multiple lanes merging, you will get mapping statistics of all lanes of the library in one value.
378
+
367
379
#### Background
368
380
369
381
Preseq is a collection of tools that allow assessment of the complexity of the library, where complexity means the number of unique molecules in your library (i.e. not molecules with the exact same length and sequence).
@@ -390,6 +402,8 @@ Plateauing can be caused by a number of reasons:
390
402
391
403
### DamageProfiler
392
404
405
+
You will receive output for each deduplicated _library_. This means that if you use TSV input and have one library sequenced over multiple lanes merging, you will get mapping statistics of all lanes of the library in one value.
406
+
393
407
#### Background
394
408
395
409
DamageProfiler is a tool which calculates a variety of standard 'aDNA' metrics from a BAM file. The primary plots here are the misincorporation and length distribution plots. Ancient DNA undergoes depurination and hydrolysis, causing fragmentation of molecules into gradually shorter fragments, and cytosine to thymine deamination damage, that occur on the subsequent single-stranded overhangs at the ends of molecules.
@@ -431,14 +445,16 @@ When looking at the length distribution plots, keep in mind the following:
431
445
432
446
### QualiMap
433
447
434
-
#### QualiMap
448
+
#### Background
435
449
436
450
Qualimap is a tool which provides statistics on the quality of the mapping of your reads to your reference genome. It allows you to assess how well covered your reference genome is by your data, both in 'fold' depth (average number of times a given base on the reference is covered by a read) and 'percentage' (the percentage of all bases on the reference genome that is covered at a given fold depth). These outputs allow you to make decision if you have enough quality data for downstream applications like genotyping, and how to adjust the parameters for those tools accordingly.
437
451
438
452
> NB: Neither fold coverage nor percent coverage on there own is sufficient to assess whether you have a high quality mapping. Abnormally high fold coverages of a smaller region such as highly conserved genes or un-removed-adapter-containing reference genomes can artificially inflate the mean coverage, yet a high percent coverage is not useful if all bases of the genome are covered at just 1x coverage.
439
453
440
454
Note that many of the statistics from this module are displayed in the General Stats table (see above), as they represent single values that are not plottable.
441
455
456
+
You will receive output for each _sample_. This means you will statistics of deduplicated values of all types of libraries combined in a single value (i.e. non-UDG treated, full-UDG, paired-end, single-end all together).
457
+
442
458
#### Coverage Histogram
443
459
444
460
This plot shows on the Y axis the range of fold coverages that the bases of the reference genome are possibly covered by. The Y axis shows the number of bases that were covered at the given fold coverage depth as indicated on the Y axis.
0 commit comments