Skip to content

Commit 4206591

Browse files
committed
Add final citations
1 parent d23093a commit 4206591

2 files changed

Lines changed: 61 additions & 38 deletions

File tree

assets/references/introduction-to-ngs-sequencing.bib

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1820,3 +1820,57 @@ @article{Rohland2015-xn
18201820
issn = {0962-8436,1471-2970},
18211821
language = {en}
18221822
}
1823+
1824+
@article{Dohm2008-rf,
1825+
title = {Substantial biases in ultra-short read data sets from
1826+
high-throughput {DNA} sequencing},
1827+
author = {Dohm, Juliane C and Lottaz, Claudio and Borodina, Tatiana and
1828+
Himmelbauer, Heinz},
1829+
journal = {Nucleic Acids Research},
1830+
publisher = {Oxford Academic},
1831+
volume = 36,
1832+
number = 16,
1833+
pages = {e105},
1834+
abstract = {Abstract. Novel sequencing technologies permit the rapid
1835+
production of large sequence data sets. These technologies are
1836+
likely to revolutionize genetics an},
1837+
month = sep,
1838+
year = 2008,
1839+
url = {https://dx.doi.org/10.1093/nar/gkn425},
1840+
keywords = {helicobacter; datasets; beta vulgaris; genome; sequence analysis,
1841+
dna},
1842+
doi = {10.1093/nar/gkn425},
1843+
issn = {0305-1048,1362-4962},
1844+
language = {en}
1845+
}
1846+
1847+
@article{Gihawi2023-hu,
1848+
title = {Major data analysis errors invalidate cancer microbiome findings},
1849+
author = {Gihawi, Abraham and Ge, Yuchen and Lu, Jennifer and Puiu, Daniela
1850+
and Xu, Amanda and Cooper, Colin S and Brewer, Daniel S and
1851+
Pertea, Mihaela and Salzberg, Steven L},
1852+
journal = {mBio},
1853+
publisher = {American Society for Microbiology},
1854+
volume = 14,
1855+
number = 5,
1856+
pages = {e0160723},
1857+
abstract = {IMPORTANCE: Recent reports showing that human cancers have a
1858+
distinctive microbiome have led to a flurry of papers describing
1859+
microbial signatures of different cancer types. Many of these
1860+
reports are based on flawed data that, upon re-analysis,
1861+
completely overturns the original findings. The re-analysis
1862+
conducted here shows that most of the microbes originally
1863+
reported as associated with cancer were not present at all in the
1864+
samples. The original report of a cancer microbiome and more than
1865+
a dozen follow-up studies are, therefore, likely to be invalid.},
1866+
month = oct,
1867+
year = 2023,
1868+
url = {https://journals.asm.org/doi/10.1128/mbio.01607-23},
1869+
keywords = {bioinformatics; cancer; computational biology; metagenomics;
1870+
microbiome},
1871+
doi = {10.1128/mbio.01607-23},
1872+
pmc = {PMC10653788},
1873+
pmid = 37811944,
1874+
issn = {2161-2129,2150-7511},
1875+
language = {en}
1876+
}

introduction-to-ngs-sequencing.qmd

Lines changed: 7 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,6 @@ number-depth: 2
55
bibliography: assets/references/introduction-to-ngs-sequencing.bib
66
---
77

8-
::: {.callout-important}
9-
🚧 This page is still under construction 🚧
10-
:::
11-
128
Next generation sequencing (NGS) revolutionised biology by providing rapid and cheap access to huge amounts of DNA sequence data.
139
One unexpected benefit of the technology used in Illumina NGS sequencers was that it was also ideal for sequencing ultra-short ancient DNA.
1410

@@ -86,7 +82,7 @@ So palaeogenomicists first demineralise the bone to release the DNA, before degr
8682

8783
In contrast, the resulting DNA molecules are quite different from the modern DNA.
8884
Rather than well cooked, soft spaghetti structures - ancient DNA molecules are more like extremely overcooked spaghetti.
89-
These molecules are highly degraded, broken down into very small fragments, and also often have 'damage' at the ends in the form of 'modified nucleotides' that do not represent the original sequence [@Dabney2013-zo] (see the [Introduction to Ancient DNA chapter](introduction-to-ancient-dna.qmd) for more information).
85+
These molecules are highly degraded, broken down into very small fragments, and also often have 'damage' at the ends in the form of 'modified nucleotides' that do not represent the original sequence [@Dabney2013-zo, and see the [Introduction to Ancient DNA chapter](introduction-to-ancient-dna.qmd) for more information].
9086
Finally, the small amount of tiny and damaged DNA molecules typically sits in a 'soup' of 'contaminating' high-quality modern DNA from the surrounding burial and storage environment ([@fig-intro-ngs-fig-ancientdnainbone]).
9187

9288
As we will find out in the next section, these short fragments of DNA is not necessartily a disadvantage for NGS sequencing, but rather a benefit.
@@ -339,7 +335,7 @@ Furthermore, if we have sequenced multiple samples at the same time with multipl
339335

340336
We can do this with two, often integrated, steps.
341337

342-
Base calling is the process of converting the images to digital text-based `A`, `C`, `T,` and `G`s [@Rougemont2008-ugp].
338+
Base calling is the process of converting the images to digital text-based `A`, `C`, `T,` and `G`s [@Rougemont2008-ug].
343339
This is not something the vast majority of researchers have to do, as nowadays it happens on the sequencer itself or by the sequencing technicians and thus not necessary for researchers to carry out.
344340

345341
However, once the file with the digital representations of the sequences is taken off the machine, and if not also performed by the sequencing facility, we may have to perform something called 'demultiplexing' (@fig-intro-ngs-fig-demultiplexing).
@@ -455,7 +451,7 @@ All sequencing machines will record 'their confidence' in the base calls they ma
455451
It is still critical that researchers quality check these before performing downstream analyses.
456452

457453
If our reads have a high number of low base quality socres, the machine may have picked up the wrong nucleotide in the sequence.
458-
This could cause a range of problems in various aspects of data analysis: our read may falsely taxonomically classified to the wrong organism with that has a more similar sequence to our errored sequence than the original organism, our read may align to wrong place on a genome during [mapping](genome-mapping.qmd) (or not align at all!), prevent sufficient overlap of sequences during assembly causing fragmented assemblies, or even cause false positive variant calls during genotyping for phylogenomic analysis <!-- CITE -->.
454+
This could cause a range of problems in various aspects of data analysis: our read may falsely taxonomically classified to the wrong organism with that has a more similar sequence to our errored sequence than the original organism, our read may align to wrong place on a genome during [mapping](genome-mapping.qmd) (or not align at all!), prevent sufficient overlap of sequences during assembly causing fragmented assemblies, or even cause false positive variant calls during genotyping for phylogenomic analysis [@Dohm2008-rf].
459455

460456
This is a particular concern for ancient metagenomics due to the very low number of truly endogenous ancient molecules in our libraries.
461457
This low number of reads means that we cannot as easily 'correct' for errors through simply having many repeated observations of a base call in the same a position (higher depth coverage) from independent DNA molecules.
@@ -473,12 +469,11 @@ To briefly jump ahead into the bioinformatic analysis of an ancient metagenomic
473469
This is done to classify which species' genome a particular read comes from, and allows us to infer the taxonomic makeup of the sample.
474470

475471
We pull these reference genomes from a range of user-submitted databases, such as the NCBI's GenBank or RefSeq databases.
476-
However, the genomes that are uploaded to these databases are not always of high quality <!-- CITE conterminator -->.
472+
However, the genomes that are uploaded to these databases are not always of high quality.
473+
Some genomes can contain sequences that should not be there, such as adapters, primers, contaminating sequences from other species, or other artefactual sequences [@Longo2011-qd;@Mukherjee2015-vc;@Merchant2014-eu;@Steinegger2020-br;@Breitwieser2019-iz;@Kryukov2016-my].
477474
While the NCBI does have quality control checks in place, these have not always been as stringent in the past, and are constantly evolving.
478475

479-
This means that some genomes in these databases are 'dirty' - i.e., they contain sequences that should not be there, such as adapters, primers, contaminating sequences from other species, or other artefactual sequences [@Longo2011-qd;@Mukherjee2015-vc;@Merchant2014-eu;@Steinegger2020-br;@Breitwieser2019-iz;@Kryukov2016-my].
480-
481-
A common example, which many ancient metagenomicists have encountered is the repeated identification of _Cyprinus carpio_ (carp) in their samples (@fig-intro-ngs-fig-contamination-blog-screenshots).
476+
A common example of a notoriously contaminated genome, which many ancient metagenomicists have encountered is the repeated identification of _Cyprinus carpio_ (carp) in their samples (@fig-intro-ngs-fig-contamination-blog-screenshots).
482477

483478
This is not a true hit, but in fact false positive hits due to the presence of adapter sequences in the carp genome ^[See [https://web.archive.org/web/20170823143538/http://www.opiniomics.org/we-need-to-stop-making-this-simple-fcking-mistake/](https://web.archive.org/web/20170823143538/http://www.opiniomics.org/we-need-to-stop-making-this-simple-fcking-mistake/) and [https://web.archive.org/web/20241012070028/https://grahametherington.blogspot.com/2014/09/why-you-should-qc-your-reads-and-your.html](https://web.archive.org/web/20241012070028/https://grahametherington.blogspot.com/2014/09/why-you-should-qc-your-reads-and-your.html)].
484479
I.e., remaining adapter sequences in the sequencing library that were not properly removed during read preprocessing, align against the adapter sequences in the carp genome, resulting in false positive identification of carp in all samples.
@@ -487,7 +482,7 @@ I.e., remaining adapter sequences in the sequencing library that were not proper
487482

488483
The implication for ancient metagenomicists is that if we do not properly remove artefacts from our reads, we can end up with a lot of false positive hits in our data.
489484
This can be particularly impactful, for example, if we are trying to identify the presence of dietary species in a human microbiome sample.
490-
But this also extends to microbes, where insufficient removal of contaminating DNA (e.g. modern human sequences incorporated during sampling) can align against stretches of human sequences incorporated into chimeric reference microbial genomes <!-- CITE Microbiome cancer controvsery? -->.
485+
But this also extends to microbes, where insufficient removal of contaminating DNA (e.g. modern human sequences incorporated during sampling) can align against stretches of human sequences incorporated into chimeric reference microbial genomes [@Breitwieser2019-iz,@Gihawi2023-hu].
491486

492487
Therefore, while it is always good to perform quality checks on the genomes going into our reference database, we should also thoroughly quality control our sequenced reads _prior_ downstream analysis.
493488
We should make sure to trim reads of adapters, remove host contamination and other artefacts, and check these steps worked properly!
@@ -526,32 +521,6 @@ We discussed how sequencing methods are not perfect, and how the confidence in b
526521

527522
Finally we discussed some important considerations ancient DNA and ancient metagenomics, including duplicated sequences, index hopping, sequencing errors, causes behind contaminated reference genomes, and poly-G tails in low sequence diversity reads.
528523

529-
## Readings
530-
531-
### Reviews
532-
533-
[@Schuster2008-qx]
534-
535-
[@Shendure2008-fh]
536-
537-
[@Slatko2018-hg]
538-
539-
[@Van_Dijk2014-ep]
540-
541-
### Sequencing Library Construction
542-
543-
[@Kircher2012-fg]
544-
545-
[@Meyer2010-qc]
546-
547-
### Errors and Considerations
548-
549-
[@Ma2019-lg]
550-
551-
[@Sinha2017-zo]
552-
553-
[@Van_der_Valk2019-to]
554-
555524
## Questions to think about
556525

557526
- Why is Illumina sequencing technologies useful for aDNA?

0 commit comments

Comments
 (0)