You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://github.com/codespaces/new/nf-core/eager)
9
9
[](https://github.com/nf-core/eager/actions/workflows/nf-test.yml)
10
-
[](https://github.com/nf-core/eager/actions/workflows/linting.yml)[](https://nf-co.re/eager/results)[](https://doi.org/10.5281/zenodo.XXXXXXX)
10
+
[](https://github.com/nf-core/eager/actions/workflows/linting.yml)[](https://nf-co.re/eager/results)[](https://doi.org/10.5281/zenodo.1465061)
[](https://nfcore.slack.com/channels/eager)[](https://bsky.app/profile/nf-co.re)[](https://mstdn.science/@nf_core)[](https://www.youtube.com/c/nf-core)
**nf-core/eager** is a bioinformatics pipeline that ...
26
+
**nf-core/eager** is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
25
27
26
-
<!-- TODO nf-core:
27
-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28
-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29
-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30
-
-->
28
+
## Pipeline summary
31
29
32
30
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33
-
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
34
-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
31
+
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
32
+
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
33
+
34
+
- (Optionally) create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)
35
+
- Sequencing quality control (`FastQC`, `Falco`)
36
+
- Sequencing adapter removal, paired-end data merging (`AdapterRemoval`)
37
+
- Read mapping to reference using (`bwa aln`, `bwa mem`, `CircularMapper`, `bowtie2`, or `mapAD`)
38
+
- Post-mapping processing, statistics and conversion to bam (`samtools`, and `preseq`)
39
+
- Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler`)
40
+
- PCR duplicate removal (`DeDup` or `MarkDuplicates`)
41
+
- Post-mapping statistics and BAM quality control (`Qualimap`)
- Post-AdapterRemoval trimming of FASTQ files prior mapping (`fastp`)
57
+
- Automatic conversion of unmapped reads to FASTQ (`samtools`)
58
+
- Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)
59
+
60
+
#### aDNA Damage manipulation
61
+
62
+
- Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`)
63
+
- Damaged reads extraction and assessment (`PMDTools`)
64
+
- Nuclear DNA contamination estimation of human samples (`angsd`)
65
+
66
+
#### Genotyping
67
+
68
+
- Creation of VCF genotyping files (`GATK UnifiedGenotyper`, `GATK HaplotypeCaller` and `FreeBayes`)
69
+
- Creation of EIGENSTRAT genotyping files (`pileupCaller`)
70
+
- Creation of Genotype Likelihood files (`angsd`)
71
+
- Consensus sequence FASTA creation (`VCF2Genome`)
72
+
- SNP Table generation (`MultiVCFAnalyzer`)
73
+
74
+
#### Biological Information
75
+
76
+
- Mitochondrial to Nuclear read ratio calculation (`MtNucRatioCalculator`)
77
+
- Statistical sex determination of human individuals (`Sex.DetERRmine`)
78
+
79
+
#### Metagenomic Screening
80
+
81
+
- Low-sequenced complexity filtering (`BBduk` or `PRINSEQ++`)
82
+
- Taxonomic binner with alignment (`MALT` or `MetaPhlAn 4`)
83
+
- Taxonomic binner without alignment (`Kraken2`,`KrakenUniq`)
84
+
- aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)
85
+
86
+
#### Functionality Overview
87
+
88
+
A graphical overview of suggested routes through the pipeline depending on context can be seen below.
89
+
90
+
<palign="center">
91
+
<img src="docs/images/eager2_metromap_complex.png" alt="nf-core/eager metro map" width="70%"
92
+
</p>
35
93
36
94
## Usage
37
95
38
96
> [!NOTE]
39
97
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
40
98
41
-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
42
-
Explain what rows and columns represent. For instance (please edit as appropriate):
43
-
44
99
First, prepare a samplesheet with your input data that looks as follows:
sample1 sample1_a 1 4 paired double none /<path>/<to>/sample1_a_l1_r1.fq.gz /<path>/<to>/sample1_a_l1_r2.fq.gz NA NA
106
+
sample2 sample2_a 2 2 single double full /<path>/<to>/sample2_a_l1_r1.fq.gz NA NA NA
107
+
sample3 sample3_a 8 4 single double half NA NA /<path>/<to>/sample31_a.bam Mammoth_MT_Krause
51
108
```
52
109
53
-
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
54
-
55
-
-->
110
+
Each row represents a fastq file (single-end), pair of fastq files (paired end), and/or a bam file.
56
111
57
112
Now, you can run the pipeline using:
58
113
59
-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
60
-
61
114
```bash
62
115
nextflow run nf-core/eager \
63
116
-profile <docker/singularity/.../institute> \
64
117
--input samplesheet.csv \
118
+
--fasta '<your_reference>.fasta' \
65
119
--outdir <OUTDIR>
66
120
```
67
121
@@ -78,11 +132,40 @@ For more details about the output files and reports, please refer to the
78
132
79
133
## Credits
80
134
81
-
nf-core/eager was originally written by The nf-core/eager community.
135
+
This pipeline was established by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)) and [James A. Fellows Yates](https://github.com/jfy133). Version two had major contributions from [Stephen Clayton](https://github.com/sc13-bioinf), [Thiseas C. Lamnidis](https://github.com/TCLamnidis), [Maxime Borry](https://github.com/maxibor), [Zandra Fagernäs](https://github.com/ZandraFagernas), [Aida Andrades Valtueña](https://github.com/aidaanva) and [Maxime Garcia](https://github.com/MaxUlysse) and the nf-core community.
82
136
83
137
We thank the following people for their extensive assistance in the development of this pipeline:
84
138
85
-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
@@ -92,10 +175,9 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
92
175
93
176
## Citations
94
177
95
-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
96
-
<!-- If you use nf-core/eager for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
178
+
If you use nf-core/eager for your analysis, please cite it using the following doi:
97
179
98
-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
180
+
> Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: [10.7717/peerj.10947](https://doi.org/10.7717/peerj.10947).
99
181
100
182
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
0 commit comments