Skip to content

Commit 08a0894

Browse files
authored
Merge branch 'dev' into new-images
2 parents 47089ab + 492b1aa commit 08a0894

11 files changed

Lines changed: 278 additions & 126 deletions

File tree

.github/workflows/ci.yml

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -146,16 +146,16 @@ jobs:
146146
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_pmdtools
147147
- name: GENOTYPING_UG AND MULTIVCFANALYZER Test running GATK UnifiedGenotyper and MultiVCFAnalyzer, additional VCFS
148148
run: |
149-
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer --additional_vcf_files 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/vcf/JK2772_CATCAGTGAGTAGA_L008_R1_001.fastq.gz.tengrand.fq.combined.fq.mapped_rmdup.bam.unifiedgenotyper.vcf.gz' --write_allele_frequencies
149+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer --additional_vcf_files 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/vcf/JK2772_CATCAGTGAGTAGA_L008_R1_001.fastq.gz.tengrand.fq.combined.fq.mapped_rmdup.bam.unifiedgenotyper.vcf.gz' --write_allele_frequencies
150150
- name: COMPLEX LANE/LIBRARY MERGING Test running lane and library merging prior to GATK UnifiedGenotyper and running MultiVCFAnalyzer
151151
run: |
152-
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --run_genotyping --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer
152+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --run_genotyping --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer
153153
- name: GENOTYPING_UG ON TRIMMED BAM Test
154154
run: |
155-
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --run_trim_bam --genotyping_source 'trimmed' --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP'
155+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --run_trim_bam --genotyping_source 'trimmed' --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP'
156156
- name: BAM_INPUT Run the basic pipeline with the bam input profile, skip AdapterRemoval as no convertBam
157157
run: |
158-
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_adapterremoval
158+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_adapterremoval
159159
- name: BAM_INPUT Run the basic pipeline with the bam input profile, convert to FASTQ for adapterremoval test and downstream
160160
run: |
161161
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --run_convertinputbam
@@ -167,6 +167,9 @@ jobs:
167167
- name: METAGENOMIC Run the basic pipeline but with unmapped reads going into MALT
168168
run: |
169169
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_bam_filtering --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database "/home/runner/work/eager/eager/databases/malt/" --malt_sam_output
170+
- name: METAGENOMIC Run the basic pipeline but low-complexity filtered reads going into MALT
171+
run: |
172+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_bam_filtering --bam_unmapped_type 'fastq' --run_metagenomic_screening --metagenomic_tool 'malt' --database "/home/runner/work/eager/eager/databases/malt/" --metagenomic_complexity_filter
170173
- name: MALTEXTRACT Download resource files
171174
run: |
172175
mkdir -p databases/maltextract
@@ -186,3 +189,6 @@ jobs:
186189
- name: MTNUCRATIO Run basic pipeline with bam input profile, but don't convert BAM, skip everything but nmtnucratio
187190
run: |
188191
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_humanbam,docker --skip_fastqc --skip_adapterremoval --skip_deduplication --skip_qualimap --skip_preseq --skip_damage_calculation --run_mtnucratio
192+
- name: RESCALING Run basic pipeline with basic pipeline but with mapDamage rescaling of BAM files. Note this will be slow
193+
run: |
194+
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_mapdamage_rescaling --run_genotyping --genotyping_tool hc --genotyping_source 'rescaled'

CHANGELOG.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,17 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
77

88
### `Added`
99

10+
- [#640](https://github.com/nf-core/eager/issues/640) - Added a pre-metagenomic screening filtering of low-sequence complexity reads with `bbduk`
11+
- [#583](https://github.com/nf-core/eager/issues/583) - Added `mapDamage2` rescaling of BAM files to remove damage
1012
- Updated usage (merging files) and workflow images reflecting new functionality.
1113

1214
### `Fixed`
1315

1416
- Removed leftover old DockerHub push CI commands.
15-
- [#627](https://github.com/nf-core/eager/issues/627) Added de Barros Damgaard citation to README
16-
- [#630](https://github.com/nf-core/eager/pull/630) Better handling of Qualimap memory requirements and error strategy.
17+
- [#627](https://github.com/nf-core/eager/issues/627) - Added de Barros Damgaard citation to README
18+
- [#630](https://github.com/nf-core/eager/pull/630) - Better handling of Qualimap memory requirements and error strategy.
19+
- Fixed some incomplete schema options to ensure users supply valid input values
20+
- [#638](https://github.com/nf-core/eager/issues/638#issuecomment-748877567) Fixed inverted circularfilter filtering (previously filtering would happen by default, not when requested by user as originally recorded in documentation)
1721

1822
### `Dependencies`
1923

README.md

Lines changed: 36 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,39 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
2525
<img src="docs/images/usage/eager2_workflow.png" alt="nf-core/eager schematic workflow" width="70%"
2626
</p>
2727

28-
## Pipeline steps
28+
## Quick Start
29+
30+
1. Install [`nextflow`](https://nf-co.re/usage/installation) (version >= 20.04.0)
31+
32+
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`Podman`](https://podman.io/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_
33+
34+
3. Download the pipeline and test it on a minimal dataset with a single command:
35+
36+
```bash
37+
nextflow run nf-core/eager -profile test,<docker/singularity/podman/conda/institute>
38+
```
39+
40+
> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
41+
42+
4. Start running your own analysis!
43+
44+
```bash
45+
nextflow run nf-core/eager -profile <docker/singularity/conda> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
46+
```
47+
48+
5. Once your run has completed successfully, clean up the intermediate files.
49+
50+
```bash
51+
nextflow clean -f -k
52+
```
53+
54+
See [usage docs](https://nf-co.re/eager/docs/usage.md) for all of the available options when running the pipeline.
55+
56+
**N.B.** You can see an overview of the run in the MultiQC report located at `./results/MultiQC/multiqc_report.html`
57+
58+
Modifications to the default pipeline are easily made using various options as described in the documentation.
59+
60+
## Pipeline Summary
2961

3062
### Default Steps
3163

@@ -77,6 +109,7 @@ Additional functionality contained by the pipeline currently includes:
77109

78110
#### Metagenomic Screening
79111

112+
* Low-sequenced complexity filtering (`BBduk`)
80113
* Taxonomic binner with alignment (`MALT`)
81114
* Taxonomic binner without alignment (`Kraken2`)
82115
* aDNA characteristic screening of taxonomically binned data from MALT (`MaltExtract`)
@@ -89,48 +122,6 @@ A graphical overview of suggested routes through the pipeline depending on conte
89122
<img src="docs/images/usage/eager2_metromap_complex.png" alt="nf-core/eager metro map" width="70%"
90123
</p>
91124

92-
## Quick Start
93-
94-
1. Install [`nextflow`](https://nf-co.re/usage/installation) (version >= 20.04.0)
95-
96-
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`Podman`](https://podman.io/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_
97-
98-
3. Download the pipeline and test it on a minimal dataset with a single command:
99-
100-
```bash
101-
nextflow run nf-core/eager -profile test,<docker/singularity/podman/conda/institute>
102-
```
103-
104-
> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
105-
106-
4. Start running your own analysis!
107-
108-
```bash
109-
nextflow run nf-core/eager -profile <docker/singularity/conda> --input '*_R{1,2}.fastq.gz' --fasta '<your_reference>.fasta'
110-
```
111-
112-
5. Once your run has completed successfully, clean up the intermediate files.
113-
114-
```bash
115-
nextflow clean -f -k
116-
```
117-
118-
See [usage docs](https://nf-co.re/eager/docs/usage.md) for all of the available options when running the pipeline.
119-
120-
**N.B.** You can see an overview of the run in the MultiQC report located at `./results/MultiQC/multiqc_report.html`
121-
122-
Modifications to the default pipeline are easily made using various options
123-
as described in the documentation.
124-
125-
## Pipeline Summary
126-
127-
By default, the pipeline currently performs the following:
128-
129-
<!-- TODO nf-core: Fill in short bullet-pointed list of default steps of pipeline -->
130-
131-
* Sequencing quality control (`FastQC`)
132-
* Overall pipeline run summaries (`MultiQC`)
133-
134125
## Documentation
135126

136127
The nf-core/eager pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/eager/usage) and [output](https://nf-co.re/eager/output).
@@ -236,6 +227,8 @@ In addition, references of tools and data used in this pipeline are as follows:
236227
* **Bowtie2** Langmead, B. and Salzberg, S. L. 2012 Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p. 357–359. doi: [10.1038/nmeth.1923](https:/dx.doi.org/10.1038/nmeth.1923).
237228
* **sequenceTools** Stephan Schiffels (Unpublished). Download: [https://github.com/stschiff/sequenceTools](https://github.com/stschiff/sequenceTools)
238229
* **EigenstratDatabaseTools** Thiseas C. Lamnidis (Unpublished). Download: [https://github.com/TCLamnidis/EigenStratDatabaseTools.git](https://github.com/TCLamnidis/EigenStratDatabaseTools.git)
230+
* **mapDamage2** Jónsson, H., et al 2013. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics , 29(13), 1682–1684. [https://doi.org/10.1093/bioinformatics/btt193](https://doi.org/10.1093/bioinformatics/btt193)
231+
* **BBduk** Brian Bushnell (Unpublished). Download: [https://sourceforge.net/projects/bbmap/](sourceforge.net/projects/bbmap/)
239232
240233
## Data References
241234

assets/multiqc_config.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ report_comment: >
66
This report has been generated by the <a href="https://github.com/nf-core/eager" target="_blank">nf-core/eager</a>
77
analysis pipeline. For information about how to interpret these results, please see the
88
<a href="https://github.com/nf-core/eager" target="_blank">documentation</a>.
9-
109
run_modules:
1110
- adapterRemoval
1211
- bowtie2
@@ -270,4 +269,4 @@ report_section_order:
270269
nf-core-eager-summary:
271270
order: -1001
272271

273-
export_plots: true
272+
export_plots: true

bin/scrape_software_versions.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,9 @@
3535
'VCF2genome':['v_vcf2genome.txt', r"VCF2Genome \(v. ([0-9].[0-9]+) "],
3636
'endorS.py':['v_endorSpy.txt', r"endorS.py (\S+)"],
3737
'kraken':['v_kraken.txt', r"Kraken version (\S+)"],
38-
'eigenstrat_snp_coverage':['v_eigenstrat_snp_coverage.txt',r"(\S+)"]
38+
'eigenstrat_snp_coverage':['v_eigenstrat_snp_coverage.txt',r"(\S+)"],
39+
'mapDamage2':['v_mapdamage.txt',r"(\S+)"],
40+
'bbduk':['v_bbduk.txt',r"(\S+)"]
3941
}
4042

4143
results = OrderedDict()
@@ -55,7 +57,7 @@
5557
results['Qualimap'] = '<span style="color:#999999;\">N/A</span>'
5658
results['Preseq'] = '<span style="color:#999999;\">N/A</span>'
5759
results['GATK HaplotypeCaller'] = '<span style="color:#999999;\">N/A</span>'
58-
#results['GATK UnifiedGenotyper'] = '<span style="color:#999999;\">N/A</span>'
60+
results['GATK UnifiedGenotyper'] = '<span style="color:#999999;\">N/A</span>'
5961
results['freebayes'] = '<span style="color:#999999;\">N/A</span>'
6062
results['sequenceTools'] = '<span style="color:#999999;\">N/A</span>'
6163
results['VCF2genome'] = '<span style="color:#999999;\">N/A</span>'
@@ -71,6 +73,9 @@
7173
results['kraken'] = '<span style="color:#999999;\">N/A</span>'
7274
results['maltextract'] = '<span style="color:#999999;\">N/A</span>'
7375
results['eigenstrat_snp_coverage'] = '<span style="color:#999999;\">N/A</span>'
76+
results['mapDamage2'] = '<span style="color:#999999;\">N/A</span>'
77+
results['bbduk'] = '<span style="color:#999999;\">N/A</span>'
78+
7479

7580
# Search each file using its regex
7681
for k, v in regexes.items():

conf/test_resources.config

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,8 @@ process {
5151
time = { check_max( 10.m * task.attempt, 'time' ) }
5252
}
5353

54+
withName:'mapdamage_rescaling'{
55+
time = { check_max( 20.m * task.attempt, 'time' ) }
56+
}
57+
5458
}

docs/output.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,11 +658,13 @@ Each module has it's own output directory which sit alongside the `MultiQC/` dir
658658
- `damageprofiler/` - this contains sample specific directories containing raw statistics and damage plots from DamageProfiler. The `.pdf` files can be used to visualise C to T miscoding lesions or read length distributions of your mapped reads. All raw statistics used for the PDF plots are contained in the `.txt` files.
659659
- `pmdtools/` - this contains raw output statistics of pmdtools (estimates of frequencies of substitutions), and BAM files which have been filtered to remove reads that do not have a Post-mortem damage (PMD) score of `--pmdtools_threshold`.
660660
- `trimmed_bam/` - this contains the BAM files with X number of bases trimmed off as defined with the `--bamutils_clip_half_udg_left`, `--bamutils_clip_half_udg_right`, `--bamutils_clip_none_udg_left`, and `--bamutils_clip_none_udg_right` flags and corresponding index files. You can use these BAM files for downstream analysis such as re-mapping data with more stringent parameters (if you set trimming to remove the most likely places containing damage in the read).
661+
- `damage_rescaling/` - this contains rescaled BAM files from mapDamage2. These BAM files have damage probabilistically removed via a bayesian model, and can be used for downstream genotyping.
661662
- `genotyping/` - this contains all the (gzipped) genotyping files produced by your genotyping module. The file suffix will have the genotyping tool name. You will have files corresponding to each of your deduplicated BAM files (except pileupcaller), or any turned-on downstream processes that create BAMs (e.g. trimmed bams or pmd tools). If `--gatk_ug_keep_realign_bam` supplied, this may also contain BAM files from InDel realignment when using GATK 3 and UnifiedGenotyping for variant calling. When pileupcaller is used to create eigenstrat genotypes, this directory also contains eigenstrat SNP coverage statistics.
662663
- `multivcfanalyzer/` - this contains all output from MultiVCFAnalyzer, including SNP calling statistics, various SNP table(s) and FASTA alignment files.
663664
- `sex_determination/` - this contains the output for the sex determination run. This is a single `.tsv` file that includes a table with the sample name, the number of autosomal SNPs, number of SNPs on the X/Y chromosome, the number of reads mapping to the autosomes, the number of reads mapping to the X/Y chromosome, the relative coverage on the X/Y chromosomes, and the standard error associated with the relative coverages. These measures are provided for each bam file, one row per file. If the `sexdeterrmine_bedfile` option has not been provided, the error bars cannot be trusted, and runtime will be considerably longer.
664665
- `nuclear_contamination/` - this contains the output of the nuclear contamination processes. The directory contains one `*.X.contamination.out` file per individual, as well as `nuclear_contamination.txt` which is a summary table of the results for all individual. `nuclear_contamination.txt` contains a header, followed by one line per individual, comprised of the Method of Moments (MOM) and Maximum Likelihood (ML) contamination estimate (with their respective standard errors) for both Method1 and Method2.
665666
- `bedtools/` - this contains two files as the output from bedtools coverage. One file contains the 'breadth' coverage (`*.breadth.gz`). This file will have the contents of your annotation file (e.g. BED/GFF), and the following subsequent columns: no. reads on feature, # bases at depth, length of feature, and % of feature. The second file (`*.depth.gz`), contains the contents of your annotation file (e.g. BED/GFF), and an additional column which is mean depth coverage (i.e. average number of reads covering each position).
667+
- `metagenomic_complexity_filter` - this contains the output from filtering of input reads to metagenomic classification of low-sequence complexity reads as performed by `bbduk`. This will include the filtered FASTQ files (`*_lowcomplexityremoved.fq.gz`) and also the run-time log (`_bbduk.stats`) for each sample. **Note:** there are no sections in the MultiQC report for this module, therefore you must check the `._bbduk.stats` files to get summary statistics of the filtering.
666668
- `metagenomic_classification/` - this contains the output for a given metagenomic classifier.
667669
- Running MALT will contain RMA6 files that can be loaded into MEGAN6 or MaltExtract for phylogenetic visualisation of read taxonomic assignments and aDNA characteristics respectively. Additional a `malt.log` file is provided which gives additional information such as run-time, memory usage and per-sample statistics of numbers of alignments with taxonomic assignment etc. This will also include gzip SAM files if requested.
668670
- Running kraken will contain the Kraken output and report files, as well as a merged Taxon count table.

environment.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,3 +47,6 @@ dependencies:
4747
- conda-forge::xopen=0.9.0
4848
- bioconda::bowtie2=2.4.1
4949
- bioconda::eigenstratdatabasetools=1.0.2
50+
- bioconda::mapdamage2=2.2.0
51+
- bioconda::bbmap=38.87
52+

0 commit comments

Comments
 (0)