You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+76-24Lines changed: 76 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,28 +12,60 @@
12
12
13
13
## Introduction
14
14
15
-
**nf-core/eager** is a bioinformatics best-practice analysis pipeline for ancient DNA data analysis.
16
-
17
-
The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results. It comes with docker / singularity containers making installation trivial and results highly reproducible.
18
-
19
-
### Pipeline steps
20
-
21
-
* Create reference genome indices (optional)
22
-
* BWA
23
-
* Samtools Index
24
-
* Sequence Dictionary
25
-
* QC with FastQC
26
-
* AdapterRemoval for read clipping and merging
27
-
* Read mapping with BWA, BWA Mem or CircularMapper
28
-
* Samtools sort, index, stats & conversion to BAM
29
-
* DeDup or MarkDuplicates read deduplication
30
-
* QualiMap BAM QC Checking
31
-
* Preseq Library Complexity Estimation
32
-
* DamageProfiler damage profiling
33
-
* BAM Clipping for UDG+/UDGhalf protocols
34
-
* PMDTools damage filtering / assessment
35
-
36
-
### Documentation
15
+
**nf-core/eager** is a bioinformatics best-practice analysis pipeline for NGS
16
+
sequencing based ancient DNA (aDNA) data analysis.
17
+
18
+
The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics
19
+
workflow tool. It pre-processes raw data from FASTQ inputs, aligns the reads
20
+
and performs extensive general NGS and aDNA specific quality-control on the
21
+
results. It comes with docker, singularity or conda containers making
22
+
installation trivial and results highly reproducible.
23
+
24
+
## Pipeline steps
25
+
26
+
By default the pipeline currently performs the following:
27
+
28
+
* Create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)
29
+
* Sequencing quality control (`FastQC`)
30
+
* Sequencing adapter removal and for paired end data merging (`AdapterRemoval`)
31
+
* Read mapping to reference using (`bwa aln`, `bwa mem` or `CircularMapper`)
32
+
* Post-mapping processing, statistics and conversion to bam (`samtools`)
33
+
* Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler`)
34
+
* PCR duplicate removal (`DeDup` or `MarkDuplicates`)
35
+
* Post-mapping statistics and BAM quality control (`Qualimap`)
* Automatic conversion of unmapped reads to FASTQ (`samtools`)
43
+
* Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`)
44
+
* Damage reads extraction and assessment (`PMDTools`)
45
+
46
+
## Quick Start
47
+
48
+
1. Install [`nextflow`](docs/installation.md)
49
+
2. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html)
50
+
3. Download the EAGER pipeline
51
+
52
+
```bash
53
+
nextflow pull nf-core/eager
54
+
```
55
+
56
+
4. Set up your job with default parameters
57
+
58
+
```bash
59
+
nextflow run nf-core -profile <docker/singularity/conda> --reads'*_R{1,2}.fastq.gz' --fasta '<REFERENCE.fasta'
60
+
```
61
+
62
+
5. See the overview of the run with under `<OUTPUT_DIR>/MultiQC/multiqc_report.html`
63
+
64
+
Modifications to the default pipeline are easily made using various options
65
+
as described in the documentation.
66
+
67
+
## Documentation
68
+
37
69
The nf-core/eager pipeline comes with documentation about the pipeline, found in the `docs/` directory:
38
70
39
71
1.[Installation](docs/installation.md)
@@ -44,5 +76,25 @@ The nf-core/eager pipeline comes with documentation about the pipeline, found in
44
76
4.[Output and how to interpret the results](docs/output.md)
45
77
5.[Troubleshooting](docs/troubleshooting.md)
46
78
47
-
### Credits
48
-
This pipeline was written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)), with major contributions from Stephen Clayton, ideas and documentation from James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to contribute, please open an issue and ask to be added to the project - happy to do so and everyone is welcome to contribute here!
79
+
## Credits
80
+
81
+
This pipeline was written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)),
82
+
with major contributions from Stephen Clayton, ideas and documentation from
83
+
James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to
84
+
contribute, please open an issue and ask to be added to the project - happy to
85
+
do so and everyone is welcome to contribute here!
86
+
87
+
## Tool References
88
+
89
+
**EAGER v1, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI)
**AdapterRemoval v2* Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. [https://doi.org/10.1186/s13104-016-1900-2](https://doi.org/10.1186/s13104-016-1900-2) Download: [https://github.com/MikkelSchubert/adapterremoval](https://github.com/MikkelSchubert/adapterremoval)
92
+
**bwa* Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324) Download: [http://bio-bwa.sourceforge.net/bwa.shtml](http://bio-bwa.sourceforge.net/bwa.shtml)
93
+
**SAMtools* Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. [https://doi.org/10.1093/bioinformatics/btp352](https://doi.org/10.1093/bioinformatics/btp352) Download: [http://www.htslib.org/](http://www.htslib.org/)
94
+
**DamageProfiler* Judith Neukamm (Unpublished)
95
+
**QualiMap* Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. [https://doi.org/10.1093/bioinformatics/btv566](https://doi.org/10.1093/bioinformatics/btv566) Download: [http://qualimap.bioinfo.cipf.es/](http://qualimap.bioinfo.cipf.es/)
96
+
**preseq* Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. [https://doi.org/10.1038/nmeth.2375](https://doi.org/10.1038/nmeth.2375). Download: [http://smithlabresearch.org/software/preseq/](http://smithlabresearch.org/software/preseq/)
97
+
**PMDTools* Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. [https://doi.org/10.1073/pnas.1318934111](https://doi.org/10.1073/pnas.1318934111) Download: [https://github.com/pontussk/PMDtools](https://github.com/pontussk/PMDtools)
98
+
**MultiQC* Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354) Download: [https://multiqc.info/](https://multiqc.info/)
99
+
**BamUtils* Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. [https://doi.org/10.1101/gr.176552.114](https://doi.org/10.1101/gr.176552.114) Download: [https://genome.sph.umich.edu/wiki/BamUtil](https://genome.sph.umich.edu/wiki/BamUtil)
0 commit comments