Skip to content

Commit 00b5d23

Browse files
authored
Merge branch 'dev' into nf-core-template-merge-3.4.1
2 parents fe825fb + 52da7b4 commit 00b5d23

381 files changed

Lines changed: 34866 additions & 264 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ testing/
77
testing*
88
*.pyc
99
null/
10+
.nf-test*

.prettierignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ testing*
1212
bin/
1313
.nf-test/
1414
ro-crate-metadata.json
15+
test/
16+
dev_docs.md

CHANGELOG.md

Lines changed: 620 additions & 2 deletions
Large diffs are not rendered by default.

CITATION.cff

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
cff-version: 1.2.0
2+
message: "If you use `nf-core/eager` in your work, please cite the following publication"
3+
authors:
4+
- family-names: Fellows Yates
5+
given-names: James A.
6+
- family-names: Lamnidis
7+
given-names: Thiseas C.
8+
- family-names: Borry
9+
given-names: Maxime
10+
- family-names: Andrades Valtueña
11+
given-names: Aida
12+
- family-names: Fagernãs
13+
given-names: Zandra
14+
- family-names: Clayton
15+
given-names: Stephen
16+
- family-names: Garcia
17+
given-names: Maxime U.
18+
- family-names: Neukamm
19+
given-names: Judith
20+
- family-names: Peltzer
21+
given-names: Alexander
22+
title: "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager"
23+
version: 3.0.0
24+
doi: 10.7717/peerj.10947
25+
date-released: 2022-08-02
26+
url: https://github.com/nf-core/eager
27+
prefered-citation:
28+
type: article
29+
authors:
30+
- family-names: Fellows Yates
31+
given-names: James A.
32+
- family-names: Lamnidis
33+
given-names: Thiseas C.
34+
- family-names: Borry
35+
given-names: Maxime
36+
- family-names: Andrades Valtueña
37+
given-names: Aida
38+
- family-names: Fagernãs
39+
given-names: Zandra
40+
- family-names: Clayton
41+
given-names: Stephen
42+
- family-names: Garcia
43+
given-names: Maxime U.
44+
- family-names: Neukamm
45+
given-names: Judith
46+
- family-names: Peltzer
47+
given-names: Alexander
48+
doi: 10.7717/peerj.10947
49+
start: e10947
50+
title: "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager"
51+
year: 2021
52+
url: https://dx.doi.org/10.1038/10.7717/peerj.10947

CITATIONS.md

Lines changed: 147 additions & 7 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 109 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
<h1>
22
<picture>
3-
<source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-eager_logo_dark.png">
4-
<img alt="nf-core/eager" src="docs/images/nf-core-eager_logo_light.png">
3+
<source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core_eager_logo_outline_drop.png">
4+
<img alt="nf-core/eager" src="docs/images/nf-core_eager_logo_outline_drop.png">
55
</picture>
66
</h1>
77

88
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new/nf-core/eager)
99
[![GitHub Actions CI Status](https://github.com/nf-core/eager/actions/workflows/nf-test.yml/badge.svg)](https://github.com/nf-core/eager/actions/workflows/nf-test.yml)
10-
[![GitHub Actions Linting Status](https://github.com/nf-core/eager/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/eager/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/eager/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
10+
[![GitHub Actions Linting Status](https://github.com/nf-core/eager/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/eager/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/eager/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.1465061-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.1465061)
1111
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
1212

1313
[![Nextflow](https://img.shields.io/badge/version-%E2%89%A525.04.0-green?style=flat&logo=nextflow&logoColor=white&color=%230DC09D&link=https%3A%2F%2Fnextflow.io)](https://www.nextflow.io/)
@@ -19,49 +19,103 @@
1919

2020
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23eager-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/eager)[![Follow on Bluesky](https://img.shields.io/badge/bluesky-%40nf__core-1185fe?labelColor=000000&logo=bluesky)](https://bsky.app/profile/nf-co.re)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
2121

22+
![HiRSE Code Promo Badge](https://img.shields.io/badge/Promo-8db427?label=HiRSE&labelColor=005aa0&link=https%3A%2F%2Fgo.fzj.de%2FCodePromo)
23+
2224
## Introduction
2325

24-
**nf-core/eager** is a bioinformatics pipeline that ...
26+
**nf-core/eager** is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
2527

26-
<!-- TODO nf-core:
27-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30-
-->
28+
## Pipeline summary
3129

3230
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33-
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
34-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
31+
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
32+
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
33+
34+
- (Optionally) create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)
35+
- Sequencing quality control (`FastQC`, `Falco`)
36+
- Sequencing adapter removal, paired-end data merging (`AdapterRemoval`)
37+
- Read mapping to reference using (`bwa aln`, `bwa mem`, `CircularMapper`, `bowtie2`, or `mapAD`)
38+
- Post-mapping processing, statistics and conversion to bam (`samtools`, and `preseq`)
39+
- Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler`)
40+
- PCR duplicate removal (`DeDup` or `MarkDuplicates`)
41+
- Post-mapping statistics and BAM quality control (`Qualimap`)
42+
- Library Complexity Estimation (`preseq`)
43+
- Overall pipeline statistics summaries (`MultiQC`)
44+
45+
### Additional Steps
46+
47+
Additional functionality contained by the pipeline currently includes:
48+
49+
#### Input
50+
51+
- Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)
52+
53+
#### Preprocessing
54+
55+
- Illumina two-coloured sequencer poly-G tail removal (`fastp`)
56+
- Post-AdapterRemoval trimming of FASTQ files prior mapping (`fastp`)
57+
- Automatic conversion of unmapped reads to FASTQ (`samtools`)
58+
- Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)
59+
60+
#### aDNA Damage manipulation
61+
62+
- Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`)
63+
- Damaged reads extraction and assessment (`PMDTools`)
64+
- Nuclear DNA contamination estimation of human samples (`angsd`)
65+
66+
#### Genotyping
67+
68+
- Creation of VCF genotyping files (`GATK UnifiedGenotyper`, `GATK HaplotypeCaller` and `FreeBayes`)
69+
- Creation of EIGENSTRAT genotyping files (`pileupCaller`)
70+
- Creation of Genotype Likelihood files (`angsd`)
71+
- Consensus sequence FASTA creation (`VCF2Genome`)
72+
- SNP Table generation (`MultiVCFAnalyzer`)
73+
74+
#### Biological Information
75+
76+
- Mitochondrial to Nuclear read ratio calculation (`MtNucRatioCalculator`)
77+
- Statistical sex determination of human individuals (`Sex.DetERRmine`)
78+
79+
#### Metagenomic Screening
80+
81+
- Low-sequenced complexity filtering (`BBduk` or `PRINSEQ++`)
82+
- Taxonomic binner with alignment (`MALT` or `MetaPhlAn 4`)
83+
- Taxonomic binner without alignment (`Kraken2`,`KrakenUniq`)
84+
- aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)
85+
86+
#### Functionality Overview
87+
88+
A graphical overview of suggested routes through the pipeline depending on context can be seen below.
89+
90+
<p align="center">
91+
<img src="docs/images/eager2_metromap_complex.png" alt="nf-core/eager metro map" width="70%"
92+
</p>
3593

3694
## Usage
3795

3896
> [!NOTE]
3997
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
4098
41-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
42-
Explain what rows and columns represent. For instance (please edit as appropriate):
43-
4499
First, prepare a samplesheet with your input data that looks as follows:
45100

46-
`samplesheet.csv`:
101+
`samplesheet.tsv`:
47102

48103
```csv
49-
sample,fastq_1,fastq_2
50-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
104+
ample_id library_id lane colour_chemistry pairment strandedness damage_treatment r1 r2 bam bam_reference_id
105+
sample1 sample1_a 1 4 paired double none /<path>/<to>/sample1_a_l1_r1.fq.gz /<path>/<to>/sample1_a_l1_r2.fq.gz NA NA
106+
sample2 sample2_a 2 2 single double full /<path>/<to>/sample2_a_l1_r1.fq.gz NA NA NA
107+
sample3 sample3_a 8 4 single double half NA NA /<path>/<to>/sample31_a.bam Mammoth_MT_Krause
51108
```
52109

53-
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
54-
55-
-->
110+
Each row represents a fastq file (single-end), pair of fastq files (paired end), and/or a bam file.
56111

57112
Now, you can run the pipeline using:
58113

59-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
60-
61114
```bash
62115
nextflow run nf-core/eager \
63116
-profile <docker/singularity/.../institute> \
64117
--input samplesheet.csv \
118+
--fasta '<your_reference>.fasta' \
65119
--outdir <OUTDIR>
66120
```
67121

@@ -78,11 +132,40 @@ For more details about the output files and reports, please refer to the
78132

79133
## Credits
80134

81-
nf-core/eager was originally written by The nf-core/eager community.
135+
This pipeline was established by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)) and [James A. Fellows Yates](https://github.com/jfy133). Version two had major contributions from [Stephen Clayton](https://github.com/sc13-bioinf), [Thiseas C. Lamnidis](https://github.com/TCLamnidis), [Maxime Borry](https://github.com/maxibor), [Zandra Fagernäs](https://github.com/ZandraFagernas), [Aida Andrades Valtueña](https://github.com/aidaanva) and [Maxime Garcia](https://github.com/MaxUlysse) and the nf-core community.
82136

83137
We thank the following people for their extensive assistance in the development of this pipeline:
84138

85-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
139+
- [Alex Hübner](https://github.com/alexhbnr)
140+
- [Alexandre Gilardet](https://github.com/alexandregilardet)
141+
- Arielle Munters
142+
- [Åshild Vågene](https://github.com/ashildv)
143+
- [Charles Plessy](https://github.com/charles-plessy)
144+
- [Elina Salmela](https://github.com/esalmela)
145+
- [Fabian Lehmann](https://github.com/Lehmann-Fabian)
146+
- [He Yu](https://github.com/paulayu)
147+
- [Hester van Schalkwyk](https://github.com/hesterjvs)
148+
- [Ian Light-Máka](https://github.com/ilight1542)
149+
- [Ido Bar](https://github.com/IdoBar)
150+
- [Irina Velsko](https://github.com/ivelsko)
151+
- [Işın Altınkaya](https://github.com/isinaltinkaya)
152+
- [Johan Nylander](https://github.com/nylander)
153+
- [Jonas Niemann](https://github.com/NiemannJ)
154+
- [Katerine Eaton](https://github.com/ktmeaton)
155+
- [Kathrin Nägele](https://github.com/KathrinNaegele)
156+
- [Kevin Lord](https://github.com/lordkev)
157+
- [Luc Venturini](https://github.com/lucventurini)
158+
- [Mahesh Binzer-Panchal](https://github.com/mahesh-panchal)
159+
- [Marcel Keller](https://github.com/marcel-keller)
160+
- [Megan Michel](https://github.com/meganemichel)
161+
- [Merlin Szymanski](https://github.com/merszym)
162+
- [Pierre Lindenbaum](https://github.com/lindenb)
163+
- [Pontus Skoglund](https://github.com/pontussk)
164+
- [Raphael Eisenhofer](https://github.com/EisenRa)
165+
- [Roberta Davidson](https://github.com/roberta-davidson)
166+
- [Rodrigo Barquera](https://github.com/RodrigoBarquera)
167+
- [Selina Carlhoff](https://github.com/scarlhoff)
168+
- [Torsten Günter](https://bitbucket.org/tguenther)
86169

87170
## Contributions and Support
88171

@@ -92,10 +175,9 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
92175

93176
## Citations
94177

95-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
96-
<!-- If you use nf-core/eager for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
178+
If you use nf-core/eager for your analysis, please cite it using the following doi:
97179

98-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
180+
> Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: [10.7717/peerj.10947](https://doi.org/10.7717/peerj.10947).
99181
100182
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
101183

477 KB
Binary file not shown.

0 commit comments

Comments
 (0)