-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathleftover-adapter.qmd
More file actions
53 lines (39 loc) · 3.31 KB
/
leftover-adapter.qmd
File metadata and controls
53 lines (39 loc) · 3.31 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Left over adapter
{#fig-leftover-adapter height=500px}
\newpage
```{r}
#| label: fig-template
#| fig-cap: Example of a smiley plot with high frequency of a single last base, due to a remaining base from an insufficiently clipped off adapter (as it was missing from the sequenced given to the adapter clipping tool). Data taken from unpublished data . Damage data generated using [DamageProfiler](/intro.qmd) and plotted using R and tidyverse packages [@Wickham2019-ot].
#| echo: false
#| warning: false
#| error: false
library(glossary)
glossary::glossary_path("glossary.yml")
glossary::glossary_popup("hover")
library(readr)
library(tidyr)
library(dplyr)
library(ggplot2)
library(patchwork)
source("assets/src/R/style.r")
source("assets/src/R/damage-profiler.r")
type_colours <- type_colours_dp
raw_5p <- readr::read_tsv("assets/data/leftover-adapter/5p_freq_misincorporations.txt", comment = "#")
raw_3p <- readr::read_tsv("assets/data/leftover-adapter/3p_freq_misincorporations.txt", comment = "#")
long_5p <- pivot_longer_freqmisincorporat(raw_5p, reverse = FALSE, type_colours = type_colours)
long_3p <- pivot_longer_freqmisincorporat(raw_3p, reverse = TRUE, type_colours = type_colours)
smiley_5p <- plot_longer_freqmisincorporat(long_5p, reverse = FALSE, type_colours)
smiley_3p <- plot_longer_freqmisincorporat(long_3p, reverse = TRUE, type_colours)
smiley_5p + smiley_3p
```
This smiley plot was found from a paired-end sequenced, double-stranded, non-UDG treated library.
It displays a very significant G to A frequency increase (exceeding 30%) on the first base of the 3' read termini, as well as a small increase for C to T misincorporations and other base changes.
This is likely due to a remaining base from an adapter that was not fully clipped off during the adapter trimming step of read preprocessing.
In the example above, the library was preprocessed with AdapterRemoval (v2.1.7) for adapter trimming and paired-end merging based on sequence overlap.
With paired end data where the DNA 'template' is shorter than the number of sequencing cycles (read length), AdapterRemoval first aligns the two reads to each other.
It then uses the information from the assumed 'symmetrical overhang' from the alignment, to identify and clip the adapters.
In other words, the sequence of the template should exactly match, and thus the rest of the read that does _not_ overlap is assumed to be adapter sequence.
A limitation of the tool is that asymmetric read pairs (e.g., reads with leading Ns or differing insert sizes that can 'shift' the template overlap) can result in incomplete adapter removal or loss of authentic fragment termini.
In this example, a single 3' terminal adapter base remained after adapter trimming due to asymmetric overlap of the templates, leading to an artificial inflation of the damage frequency.
In other tools, this can be cause of the adapter sequence was not correctly specified (e.g., one base left off).
In case you get a similar plot and you still have your raw FASTQ files (and you probably do), you can easily reprocess them with a cleaner approach — e.g. another tool such as fastp [@Chen2018-vg], Cutadapt [@Martin2011-ae] with Flash [@Magoc2011-xl] (or another preferred tool for trimming/merging).