Skip to content

Commit dfb6738

Browse files
committed
Minor ordering changes for eradability and changed command for plotPMD to plotPMD.v2.R
1 parent 99e6cc8 commit dfb6738

1 file changed

Lines changed: 38 additions & 12 deletions

File tree

authentication.qmd

Lines changed: 38 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -207,16 +207,24 @@ Now, after we have detected an interesting *Y. pestis* hit, we would like to fol
207207
::: {.callout-note title="Self guided: data preparation" collapse=true}
208208
```bash
209209
cd /<path/<to>/authentication/bowtie2
210+
```
210211

212+
```
211213
## Download reference genome
212214
NCBI=https://ftp.ncbi.nlm.nih.gov; ID=GCF_000222975.1_ASM22297v1
213215
wget $NCBI/genomes/all/GCF/000/222/975/${ID}/${ID}_genomic.fna.gz
214216
```
215217
:::
216218

219+
Change into the `bowtie2/` folder.
220+
217221
```bash
218222
cd /<path/<to>/authentication/bowtie2
223+
```
219224

225+
And then prepare the reference genome and align the reads against the reference.
226+
227+
```bash
220228
## Prepare reference genome and build Bowtie2 index
221229
gunzip GCF_000222975.1_ASM22297v1_genomic.fna.gz; echo NC_017168.1 > region.bed
222230
seqtk subseq GCF_000222975.1_ASM22297v1_genomic.fna region.bed > NC_017168.1.fasta
@@ -245,7 +253,7 @@ Load R by running `R` in your terminal
245253
R
246254
```
247255

248-
Note the following may take a minute or so to run.
256+
And run the following R code to generate a coverage plot.
249257

250258
```{r, eval = F}
251259
# Read output of samtools depth commans
@@ -270,17 +278,22 @@ mtext(paste0(round((sum(df$N_reads > 0) / length(df$N_reads)) * 100, 2),
270278
"% of genome covered"), cex = 0.8)
271279
```
272280

281+
::: {.callout-warning}
282+
Note the command above may take a minute or so to run.
283+
:::
284+
285+
In the R script above, we simply split the reference genome into *N_tiles* tiles and compute the breadth of coverage (number of reference nucleotides covered by at least one read normalised by the total length) locally in each tile.
286+
By visualising how the local breadth of coverage changes from tile to tile, we can monitor the distribution of the reads across the reference genome. In the evenness of coverage figure above, the reads seem to cover all parts of the reference genome uniformly, which is a good evidence of true-positive detection, even though the total mean breadth of coverage is low due to the low total number of reads.
287+
288+
![](assets/images/chapters/authentication/Evenness_of_coverage.png)
289+
273290
Once finished examining the plot you can quit R
274291

275292
```bash
276293
## Press 'n' when asked if you want to save your workspace image.
277294
quit()
278295
```
279296

280-
![](assets/images/chapters/authentication/Evenness_of_coverage.png)
281-
282-
In the R script above, we simply split the reference genome into *N_tiles* tiles and compute the breadth of coverage (number of reference nucleotides covered by at least one read normalised by the total length) locally in each tile. By visualising how the local breadth of coverage changes from tile to tile, we can monitor the distribution of the reads across the reference genome. In the evenness of coverage figure above, the reads seem to cover all parts of the reference genome uniformly, which is a good evidence of true-positive detection, even though the total mean breadth of coverage is low due to the low total number of reads.
283-
284297
### Alignment quality
285298

286299
In addition to evenness and breadth of coverage, it is very informative to monitor how well the metagenomic reads map to a reference genome. Here one can control for **mapping quality** ([MAPQ](https://samtools.github.io/hts-specs/SAMv1.pdf) field in the BAM-alignments) and the number of mismatches for each read, i.e. **edit distance**.
@@ -307,7 +320,9 @@ hist(as.numeric(readLines("mapq.txt")), col = "darkred", breaks = 100)
307320

308321
![](assets/images/chapters/authentication/MAPQ.png)
309322

310-
Note that MAPQ scores are computed slightly differently for Bowtie and BWA, so they are not directly comparable, however, for both MAPQ ~ 10-30, as in the histograms below, indicates good affinity of the DNA reads to the reference genome. here we provide some examples of how typical MAPQ histograms for Bowtie2 and BWA alignments can look like:
323+
Note that MAPQ scores are computed slightly differently for Bowtie and BWA, so they are not directly comparable, however, for both MAPQ ~ 10-30, as in the histograms below, indicates good affinity of the DNA reads to the reference genome.
324+
325+
We can see some examples of how typical MAPQ histograms for Bowtie2 and BWA alignments on real data below.
311326

312327
![](assets/images/chapters/authentication/mapq.png)
313328

@@ -355,9 +370,18 @@ Deamination profile of a damaged DNA demonstrate an enrichment of C / T polymorp
355370
mapDamage -i Y.pestis_sample10.sorted.bam -r NC_017168.1.fasta -d mapDamage_results/ --merge-reference-sequences --no-stats
356371
```
357372

373+
::: {.callout-note}
374+
During the summer school, you can view the two generated PDFs by running the following command
375+
376+
```bash
377+
evince mapDamage_results/Fragmisincorporation_plot.pdf
378+
evince mapDamage_results/Length_plot.pdf
379+
```
380+
:::
381+
358382
![](assets/images/chapters/authentication/deamination.png)
359383

360-
maDamage delivers a bunch of useful statistics, among other read length distribution can be checked. A typical mode of DNA reads should be within a range 30-70 base-pairs in order to be a good evidence of DNA fragmentation. Reads longer tha 100 base-pairs are more likely to originate from modern contamination.
384+
mapDamage delivers a bunch of useful statistics, among other read length distribution can be checked. A typical mode of DNA reads should be within a range 30-70 base-pairs in order to be a good evidence of DNA fragmentation. Reads longer tha 100 base-pairs are more likely to originate from modern contamination.
361385

362386
![](assets/images/chapters/authentication/read_length.png)
363387

@@ -382,18 +406,20 @@ pmd_scores <- read.delim("PMDscores.txt", header = FALSE, sep = "\t")
382406
hist(pmd_scores$V4, breaks = 1000, xlab = "PMDscores")
383407
```
384408

409+
![](assets/images/chapters/authentication/pmd_scores.png)
410+
411+
Typically, reads with PMD scores greater than 3 are considered to be reliably ancient, i.e. damaged, and can be extracted for taking a closer look. Therefore PMDtools is great for separating ancient reads from modern contaminant reads.
412+
385413
Once finished examining the plot you can quit R
386414

387415
```bash
388416
## Press 'n' when asked if you want to save your workspace image.
389417
quit()
390418
```
391419

392-
![](assets/images/chapters/authentication/pmd_scores.png)
393-
394-
Typically, reads with PMD scores greater than 3 are considered to be reliably ancient, i.e. damaged, and can be extracted for taking a closer look. Therefore PMDtools is great for separating ancient reads from modern contaminant reads.
420+
As mapDamage, PMDtools can also compute deamination profile. However, the advantage of PMDtools that it can compute deamination profile for UDG / USER treated samples (with the flag *--CpG*). For this purpose, PMDtools uses only CpG sites which escape the treatment, so deamination is not gone completely and there is a chance to authenticate treated samples. Computing deamination pattern with PMDtools can be achieved with the following command line commands.
395421

396-
As mapDamage, PMDtools can also compute deamination profile. However, the advantage of PMDtools that it can compute deamination profile for UDG / USER treated samples (with the flag *--CpG*). For this purpose, PMDtools uses only CpG sites which escape the treatment, so deamination is not gone completely and there is a chance to authenticate treated samples. Computing deamination pattern with PMDtools can be achieved with the following command line (please note that the scripts *pmdtools.0.60.py* and *plotPMD.v2.R* can be downloaded from the github repository here https://github.com/pontussk/PMDtools):
422+
::: *plotPMD.v2.R* can be downloaded from the github repository here https://github.com/pontussk/PMDtools):
397423

398424
```bash
399425
samtools view Y.pestis_sample10.bam | pmdtools --platypus > PMD_temp.txt
@@ -402,7 +428,7 @@ samtools view Y.pestis_sample10.bam | pmdtools --platypus > PMD_temp.txt
402428
We can then run simple R commands directly from the terminal (without loading R itself with) the following.
403429

404430
```bash
405-
R CMD BATCH plotPMD
431+
R CMD BATCH plotPMD.v2.R
406432
```
407433

408434
![](assets/images/chapters/authentication/PMD_Skoglund_et_al_2015_Current_Biology.png)

0 commit comments

Comments
 (0)