Skip to content

Commit 850ba3d

Browse files
authored
Merge branch 'dev' into nf-core-template-merge-3.2.1
2 parents d9abbac + 248545e commit 850ba3d

372 files changed

Lines changed: 31642 additions & 254 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.editorconfig

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ trim_trailing_whitespace = true
88
indent_size = 4
99
indent_style = space
1010

11-
[*.{md,yml,yaml,html,css,scss,js}]
11+
[*.{md,yml,yaml,html,css,scss,js,cff}]
1212
indent_size = 2
1313

1414
# These files are edited and tested upstream in nf-core/modules

.github/workflows/ci.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,16 @@ jobs:
4343
profile: "conda"
4444
- isMaster: false
4545
profile: "singularity"
46+
PARAMS:
47+
- " --preprocessing_tool fastp --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/fastp/adapters.fasta'"
48+
- " --preprocessing_tool adapterremoval --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/adapterremoval/adapterremoval_adapterlist.txt' --sequencing_qc_tool falco --run_genotyping --genotyping_tool 'freebayes' --genotyping_source 'raw'"
49+
- " --mapping_tool bwamem --run_mapdamage_rescaling --run_pmd_filtering --run_trim_bam --run_genotyping --genotyping_tool 'ug' --genotyping_source 'trimmed'"
50+
- " --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100 --run_genotyping --genotyping_tool 'hc' --genotyping_source 'raw'"
51+
- " --mapping_tool mapad"
52+
- " --mapping_tool circularmapper --skip_preprocessing --convert_inputbam --fasta_circular_target 'NC_007596.2' --fasta_circularmapper_elongationfactor 500"
53+
- "_humanbam --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz' --run_genotyping --genotyping_tool 'pileupcaller' --genotyping_source 'raw'"
54+
- "_humanbam --run_sexdeterrmine --run_genotyping --genotyping_tool 'angsd' --genotyping_source 'raw'"
55+
- "_multiref" ## TODO add damage manipulation here instead once it goes multiref
4656
steps:
4757
- name: Check out pipeline code
4858
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
@@ -85,4 +95,4 @@ jobs:
8595
- name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }}"
8696
continue-on-error: ${{ matrix.NXF_VER == 'latest-everything' }}
8797
run: |
88-
nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},${{ matrix.profile }} --outdir ./results
98+
nextflow run ${GITHUB_WORKSPACE} -profile ${{matrix.profile}},${{ matrix.test_name }}${{ matrix.PARAMS }} --outdir ./results

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ testing/
77
testing*
88
*.pyc
99
null/
10+
.nf-test*

.prettierignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,5 @@ testing*
1111
*.pyc
1212
bin/
1313
ro-crate-metadata.json
14+
test/
15+
dev_docs.md

CHANGELOG.md

Lines changed: 620 additions & 2 deletions
Large diffs are not rendered by default.

CITATION.cff

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
cff-version: 1.2.0
2+
message: "If you use `nf-core/eager` in your work, please cite the following publication"
3+
authors:
4+
- family-names: Fellows Yates
5+
given-names: James A.
6+
- family-names: Lamnidis
7+
given-names: Thiseas C.
8+
- family-names: Borry
9+
given-names: Maxime
10+
- family-names: Andrades Valtueña
11+
given-names: Aida
12+
- family-names: Fagernãs
13+
given-names: Zandra
14+
- family-names: Clayton
15+
given-names: Stephen
16+
- family-names: Garcia
17+
given-names: Maxime U.
18+
- family-names: Neukamm
19+
given-names: Judith
20+
- family-names: Peltzer
21+
given-names: Alexander
22+
title: "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager"
23+
version: 3.0.0
24+
doi: 10.7717/peerj.10947
25+
date-released: 2022-08-02
26+
url: https://github.com/nf-core/eager
27+
prefered-citation:
28+
type: article
29+
authors:
30+
- family-names: Fellows Yates
31+
given-names: James A.
32+
- family-names: Lamnidis
33+
given-names: Thiseas C.
34+
- family-names: Borry
35+
given-names: Maxime
36+
- family-names: Andrades Valtueña
37+
given-names: Aida
38+
- family-names: Fagernãs
39+
given-names: Zandra
40+
- family-names: Clayton
41+
given-names: Stephen
42+
- family-names: Garcia
43+
given-names: Maxime U.
44+
- family-names: Neukamm
45+
given-names: Judith
46+
- family-names: Peltzer
47+
given-names: Alexander
48+
doi: 10.7717/peerj.10947
49+
start: e10947
50+
title: "Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager"
51+
year: 2021
52+
url: https://dx.doi.org/10.1038/10.7717/peerj.10947

CITATIONS.md

Lines changed: 147 additions & 7 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 105 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<h1>
22
<picture>
3-
<source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-eager_logo_dark.png">
4-
<img alt="nf-core/eager" src="docs/images/nf-core-eager_logo_light.png">
3+
<source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core_eager_logo_outline_drop.png">
4+
<img alt="nf-core/eager" src="docs/images/nf-core_eager_logo_outline_drop.png">
55
</picture>
66
</h1>
77

@@ -19,47 +19,99 @@
1919

2020
## Introduction
2121

22-
**nf-core/eager** is a bioinformatics pipeline that ...
22+
**nf-core/eager** is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
2323

24-
<!-- TODO nf-core:
25-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
26-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
27-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
28-
-->
24+
## Pipeline summary
2925

3026
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
3127
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
32-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
28+
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
29+
30+
- (Optionally) create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)
31+
- Sequencing quality control (`FastQC`, `Falco`)
32+
- Sequencing adapter removal, paired-end data merging (`AdapterRemoval`)
33+
- Read mapping to reference using (`bwa aln`, `bwa mem`, `CircularMapper`, `bowtie2`, or `mapAD`)
34+
- Post-mapping processing, statistics and conversion to bam (`samtools`, and `preseq`)
35+
- Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler`)
36+
- PCR duplicate removal (`DeDup` or `MarkDuplicates`)
37+
- Post-mapping statistics and BAM quality control (`Qualimap`)
38+
- Library Complexity Estimation (`preseq`)
39+
- Overall pipeline statistics summaries (`MultiQC`)
40+
41+
### Additional Steps
42+
43+
Additional functionality contained by the pipeline currently includes:
44+
45+
#### Input
46+
47+
- Automatic merging of complex sequencing setups (e.g. multiple lanes, sequencing configurations, library types)
48+
49+
#### Preprocessing
50+
51+
- Illumina two-coloured sequencer poly-G tail removal (`fastp`)
52+
- Post-AdapterRemoval trimming of FASTQ files prior mapping (`fastp`)
53+
- Automatic conversion of unmapped reads to FASTQ (`samtools`)
54+
- Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)
55+
56+
#### aDNA Damage manipulation
57+
58+
- Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`)
59+
- Damaged reads extraction and assessment (`PMDTools`)
60+
- Nuclear DNA contamination estimation of human samples (`angsd`)
61+
62+
#### Genotyping
63+
64+
- Creation of VCF genotyping files (`GATK UnifiedGenotyper`, `GATK HaplotypeCaller` and `FreeBayes`)
65+
- Creation of EIGENSTRAT genotyping files (`pileupCaller`)
66+
- Creation of Genotype Likelihood files (`angsd`)
67+
- Consensus sequence FASTA creation (`VCF2Genome`)
68+
- SNP Table generation (`MultiVCFAnalyzer`)
69+
70+
#### Biological Information
71+
72+
- Mitochondrial to Nuclear read ratio calculation (`MtNucRatioCalculator`)
73+
- Statistical sex determination of human individuals (`Sex.DetERRmine`)
74+
75+
#### Metagenomic Screening
76+
77+
- Low-sequenced complexity filtering (`BBduk` or `PRINSEQ++`)
78+
- Taxonomic binner with alignment (`MALT` or `MetaPhlAn 4`)
79+
- Taxonomic binner without alignment (`Kraken2`,`KrakenUniq`)
80+
- aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)
81+
82+
#### Functionality Overview
83+
84+
A graphical overview of suggested routes through the pipeline depending on context can be seen below.
85+
86+
<p align="center">
87+
<img src="docs/images/eager2_metromap_complex.png" alt="nf-core/eager metro map" width="70%"
88+
</p>
3389

3490
## Usage
3591

3692
> [!NOTE]
3793
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
3894
39-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
40-
Explain what rows and columns represent. For instance (please edit as appropriate):
41-
4295
First, prepare a samplesheet with your input data that looks as follows:
4396

44-
`samplesheet.csv`:
97+
`samplesheet.tsv`:
4598

4699
```csv
47-
sample,fastq_1,fastq_2
48-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
100+
ample_id library_id lane colour_chemistry pairment strandedness damage_treatment r1 r2 bam bam_reference_id
101+
sample1 sample1_a 1 4 paired double none /<path>/<to>/sample1_a_l1_r1.fq.gz /<path>/<to>/sample1_a_l1_r2.fq.gz NA NA
102+
sample2 sample2_a 2 2 single double full /<path>/<to>/sample2_a_l1_r1.fq.gz NA NA NA
103+
sample3 sample3_a 8 4 single double half NA NA /<path>/<to>/sample31_a.bam Mammoth_MT_Krause
49104
```
50105

51-
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
52-
53-
-->
106+
Each row represents a fastq file (single-end), pair of fastq files (paired end), and/or a bam file.
54107

55108
Now, you can run the pipeline using:
56109

57-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
58-
59110
```bash
60111
nextflow run nf-core/eager \
61112
-profile <docker/singularity/.../institute> \
62113
--input samplesheet.csv \
114+
--fasta '<your_reference>.fasta' \
63115
--outdir <OUTDIR>
64116
```
65117

@@ -76,11 +128,40 @@ For more details about the output files and reports, please refer to the
76128

77129
## Credits
78130

79-
nf-core/eager was originally written by The nf-core/eager community.
131+
This pipeline was established by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)) and [James A. Fellows Yates](https://github.com/jfy133). Version two had major contributions from [Stephen Clayton](https://github.com/sc13-bioinf), [Thiseas C. Lamnidis](https://github.com/TCLamnidis), [Maxime Borry](https://github.com/maxibor), [Zandra Fagernäs](https://github.com/ZandraFagernas), [Aida Andrades Valtueña](https://github.com/aidaanva) and [Maxime Garcia](https://github.com/MaxUlysse) and the nf-core community.
80132

81133
We thank the following people for their extensive assistance in the development of this pipeline:
82134

83-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
135+
- [Alex Hübner](https://github.com/alexhbnr)
136+
- [Alexandre Gilardet](https://github.com/alexandregilardet)
137+
- Arielle Munters
138+
- [Åshild Vågene](https://github.com/ashildv)
139+
- [Charles Plessy](https://github.com/charles-plessy)
140+
- [Elina Salmela](https://github.com/esalmela)
141+
- [Fabian Lehmann](https://github.com/Lehmann-Fabian)
142+
- [He Yu](https://github.com/paulayu)
143+
- [Hester van Schalkwyk](https://github.com/hesterjvs)
144+
- [Ian Light-Máka](https://github.com/ilight1542)
145+
- [Ido Bar](https://github.com/IdoBar)
146+
- [Irina Velsko](https://github.com/ivelsko)
147+
- [Işın Altınkaya](https://github.com/isinaltinkaya)
148+
- [Johan Nylander](https://github.com/nylander)
149+
- [Jonas Niemann](https://github.com/NiemannJ)
150+
- [Katerine Eaton](https://github.com/ktmeaton)
151+
- [Kathrin Nägele](https://github.com/KathrinNaegele)
152+
- [Kevin Lord](https://github.com/lordkev)
153+
- [Luc Venturini](https://github.com/lucventurini)
154+
- [Mahesh Binzer-Panchal](https://github.com/mahesh-panchal)
155+
- [Marcel Keller](https://github.com/marcel-keller)
156+
- [Megan Michel](https://github.com/meganemichel)
157+
- [Merlin Szymanski](https://github.com/merszym)
158+
- [Pierre Lindenbaum](https://github.com/lindenb)
159+
- [Pontus Skoglund](https://github.com/pontussk)
160+
- [Raphael Eisenhofer](https://github.com/EisenRa)
161+
- [Roberta Davidson](https://github.com/roberta-davidson)
162+
- [Rodrigo Barquera](https://github.com/RodrigoBarquera)
163+
- [Selina Carlhoff](https://github.com/scarlhoff)
164+
- [Torsten Günter](https://bitbucket.org/tguenther)
84165

85166
## Contributions and Support
86167

@@ -90,10 +171,9 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
90171

91172
## Citations
92173

93-
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
94-
<!-- If you use nf-core/eager for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
174+
If you use nf-core/eager for your analysis, please cite it using the following doi:
95175

96-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
176+
> Fellows Yates JA, Lamnidis TC, Borry M, Valtueña Andrades A, Fagernäs Z, Clayton S, Garcia MU, Neukamm J, Peltzer A. 2021. Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager. PeerJ 9:e10947. DOI: [10.7717/peerj.10947](https://doi.org/10.7717/peerj.10947).
97177
98178
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
99179

477 KB
Binary file not shown.

0 commit comments

Comments
 (0)