Skip to content

Commit 4528211

Browse files
authored
Merge pull request #853 from nf-core/pmdtools_mask
Pmdtools reference masking fix , sequencetools update
2 parents 9a66d39 + ec266bd commit 4528211

5 files changed

Lines changed: 44 additions & 17 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,14 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1313
- [#836](https://github.com/nf-core/eager/issues/836) Remove deprecated parameters from test profiles
1414
- [#838](https://github.com/nf-core/eager/issues/836) Fix --snpcapture_bed files not being picked up by Nextflow
1515
- [#843](https://github.com/nf-core/eager/issues/843) Re-add direct piping of AdapterRemovalFixPrefix to pigz
16+
- [#844](https://github.com/nf-core/eager/issues/844) Fixed reference masking prior to pmdtools
1617
- [#845](https://github.com/nf-core/eager/issues/845) Updates parameter documention to specify `-s` preseq parameter also applies to lc_extrap
1718
- [#851](https://github.com/nf-core/eager/issues/851) Fixes a file-name clash during additional_library_merge, post-BAM trimming of different UDG treated libraries of a sample
18-
- Fix PMDtools reference mask not being picked up by Nextflow, and it's use being evaluated against --snpcapture_bed rather than --pmdtools_reference_mask
1919
- Renamed a range of MultiQC general stats table headers to improve clarity, documentation has been updated accordingly
2020

2121
### `Dependencies`
2222

23+
- [#829](https://github.com/nf-core/eager/issues/829) Bumped sequencetools: 1.4.0.5 -> 1.5.2
2324
- Bumped MultiQC: 1.11 -> 1.12 (for run-time optimisation and tool citation information)
2425

2526
### `Deprecated`

docs/output.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -675,6 +675,7 @@ This section gives a brief summary of where to look for what files for downstrea
675675
Each module has it's own output directory which sit alongside the `MultiQC/` directory from which you opened the report.
676676

677677
* `reference_genome/`: this directory contains the indexing files of your input reference genome (i.e. the various `bwa` indices, a `samtools`' `.fai` file, and a picard `.dict`), if you used the `--saveReference` flag.
678+
* When masking of the reference is requested prior to running pmdtools, an additional directory `reference_genome/masked_genome` will be found here, containing the masked reference.
678679
* `fastqc/`: this contains the original per-FASTQ FastQC reports that are summarised with MultiQC. These occur in both `html` (the report) and `.zip` format (raw data). The `after_clipping` folder contains the same but for after AdapterRemoval.
679680
* `adapterremoval/`: this contains the log files (ending with `.settings`) with raw trimming (and merging) statistics after AdapterRemoval. In the `output` sub-directory, are the output trimmed (and merged) `fastq` files. These you can use for downstream applications such as taxonomic binning for metagenomic studies.
680681
* `post_ar_fastq_trimmed`: this contains `fastq` files that have been additionally trimmed after AdapterRemoval (if turned on). These reads are usually that had internal barcodes, or damage that needed to be removed before mapping.

environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ dependencies:
3131
- bioconda::bedtools=2.30.0
3232
- conda-forge::libiconv=1.16
3333
- conda-forge::pigz=2.6
34-
- bioconda::sequencetools=1.4.0.6
34+
- bioconda::sequencetools=1.5.2
3535
- bioconda::preseq=3.1.2
3636
- bioconda::fastp=0.20.1
3737
- bioconda::bamutil=1.0.15

main.nf

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ if("${params.fasta}".endsWith(".gz")){
185185
path zipped_fasta from file(params.fasta) // path doesn't like it if a string of an object is not prefaced with a root dir (/), so use file() to resolve string before parsing to `path`
186186

187187
output:
188-
path "$unzip" into ch_fasta into ch_fasta_for_bwaindex,ch_fasta_for_bt2index,ch_fasta_for_faidx,ch_fasta_for_seqdict,ch_fasta_for_circulargenerator,ch_fasta_for_circularmapper,ch_fasta_for_damageprofiler,ch_fasta_for_qualimap,ch_fasta_for_pmdtools,ch_fasta_for_genotyping_ug,ch_fasta_for_genotyping_hc,ch_fasta_for_genotyping_freebayes,ch_fasta_for_genotyping_pileupcaller,ch_fasta_for_vcf2genome,ch_fasta_for_multivcfanalyzer,ch_fasta_for_genotyping_angsd,ch_fasta_for_damagerescaling,ch_fasta_for_bcftools_stats
188+
path "$unzip" into ch_fasta into ch_fasta_for_bwaindex,ch_fasta_for_bt2index,ch_fasta_for_faidx,ch_fasta_for_seqdict,ch_fasta_for_circulargenerator,ch_fasta_for_circularmapper,ch_fasta_for_damageprofiler,ch_fasta_for_qualimap,ch_unmasked_fasta_for_masking,ch_unmasked_fasta_for_pmdtools,ch_fasta_for_genotyping_ug,ch_fasta_for_genotyping_hc,ch_fasta_for_genotyping_freebayes,ch_fasta_for_genotyping_pileupcaller,ch_fasta_for_vcf2genome,ch_fasta_for_multivcfanalyzer,ch_fasta_for_genotyping_angsd,ch_fasta_for_damagerescaling,ch_fasta_for_bcftools_stats
189189

190190
script:
191191
unzip = zipped_fasta.toString() - '.gz'
@@ -196,7 +196,7 @@ if("${params.fasta}".endsWith(".gz")){
196196
} else {
197197
fasta_for_indexing = Channel
198198
.fromPath("${params.fasta}", checkIfExists: true)
199-
.into{ ch_fasta_for_bwaindex; ch_fasta_for_bt2index; ch_fasta_for_faidx; ch_fasta_for_seqdict; ch_fasta_for_circulargenerator; ch_fasta_for_circularmapper; ch_fasta_for_damageprofiler; ch_fasta_for_qualimap; ch_fasta_for_pmdtools; ch_fasta_for_genotyping_ug; ch_fasta__for_genotyping_hc; ch_fasta_for_genotyping_hc; ch_fasta_for_genotyping_freebayes; ch_fasta_for_genotyping_pileupcaller; ch_fasta_for_vcf2genome; ch_fasta_for_multivcfanalyzer;ch_fasta_for_genotyping_angsd;ch_fasta_for_damagerescaling;ch_fasta_for_bcftools_stats }
199+
.into{ ch_fasta_for_bwaindex; ch_fasta_for_bt2index; ch_fasta_for_faidx; ch_fasta_for_seqdict; ch_fasta_for_circulargenerator; ch_fasta_for_circularmapper; ch_fasta_for_damageprofiler; ch_fasta_for_qualimap; ch_unmasked_fasta_for_masking; ch_unmasked_fasta_for_pmdtools; ch_fasta_for_genotyping_ug; ch_fasta__for_genotyping_hc; ch_fasta_for_genotyping_hc; ch_fasta_for_genotyping_freebayes; ch_fasta_for_genotyping_pileupcaller; ch_fasta_for_vcf2genome; ch_fasta_for_multivcfanalyzer;ch_fasta_for_genotyping_angsd;ch_fasta_for_damagerescaling;ch_fasta_for_bcftools_stats }
200200
}
201201

202202
// Check that fasta index file path ends in '.fai'
@@ -247,15 +247,16 @@ if ( !params.clip_adapters_list ) {
247247
}
248248

249249
if ( params.snpcapture_bed ) {
250-
Channel.fromPath(params.snpcapture_bed, checkIfExists: true).into { ch_snpcapture_bed; ch_snpcapture_bed_pmd }
250+
Channel.fromPath(params.snpcapture_bed, checkIfExists: true).set { ch_snpcapture_bed }
251251
} else {
252-
Channel.fromPath("$projectDir/assets/nf-core_eager_dummy.txt").into { ch_snpcapture_bed; ch_snpcapture_bed_pmd }
252+
Channel.fromPath("$projectDir/assets/nf-core_eager_dummy.txt").set { ch_snpcapture_bed }
253253
}
254254

255-
if ( params.pmdtools_reference_mask ) {
256-
ch_pmdtoolsmask = Channel.fromPath(params.pmdtools_reference_mask, checkIfExists: true)
255+
// Set up channel with pmdtools reference mask bedfile
256+
if (!params.pmdtools_reference_mask) {
257+
ch_bedfile_for_reference_masking = Channel.fromPath("$projectDir/assets/nf-core_eager_dummy.txt")
257258
} else {
258-
ch_pmdtoolsmask = Channel.fromPath("$projectDir/assets/nf-core_eager_dummy2.txt")
259+
ch_bedfile_for_reference_masking = Channel.fromPath(params.pmdtools_reference_mask, checkIfExists: true)
259260
}
260261

261262
// SexDetermination channel set up and bedfile validation
@@ -2136,6 +2137,34 @@ process mapdamage_rescaling {
21362137

21372138
// Optionally perform further aDNA evaluation or filtering for just reads with damage etc.
21382139

2140+
process mask_reference_for_pmdtools {
2141+
label 'sc_tiny'
2142+
tag "${fasta}"
2143+
publishDir "${params.outdir}/reference_genome/masked_reference", mode: params.publish_dir_mode
2144+
2145+
when: (params.pmdtools_reference_mask && params.run_pmdtools)
2146+
2147+
input:
2148+
file fasta from ch_unmasked_fasta_for_masking
2149+
file bedfile from ch_bedfile_for_reference_masking
2150+
2151+
output:
2152+
file "${fasta.baseName}_masked.fa" into ch_masked_fasta_for_pmdtools
2153+
2154+
script:
2155+
log.info "[nf-core/eager]: Masking reference \'${fasta}\' at positions found in \'${bedfile}\'. Masked reference will be used for pmdtools."
2156+
"""
2157+
bedtools maskfasta -fi ${fasta} -bed ${bedfile} -fo ${fasta.baseName}_masked.fa
2158+
"""
2159+
}
2160+
2161+
// If masking was requested, use masked reference for pmdtools, else use original reference
2162+
if (params.pmdtools_reference_mask) {
2163+
ch_masked_fasta_for_pmdtools.set{ch_fasta_for_pmdtools}
2164+
} else {
2165+
ch_unmasked_fasta_for_pmdtools.set{ch_fasta_for_pmdtools}
2166+
}
2167+
21392168
process pmdtools {
21402169
label 'mc_medium'
21412170
tag "${libraryid}"
@@ -2146,8 +2175,6 @@ process pmdtools {
21462175
input:
21472176
tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path(bam), path(bai) from ch_rmdup_for_pmdtools
21482177
file fasta from ch_fasta_for_pmdtools.collect()
2149-
path snpcapture_bed from ch_snpcapture_bed_pmd
2150-
path pmdtools_reference_mask from ch_pmdtoolsmask
21512178

21522179
output:
21532180
tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, path("*.pmd.bam"), path("*.pmd.bam.{bai,csi}") into ch_output_from_pmdtools
@@ -2156,18 +2183,16 @@ process pmdtools {
21562183
script:
21572184
//Check which treatment for the libraries was used
21582185
def treatment = udg ? (udg == 'half' ? '--UDGhalf' : '--CpG') : '--UDGminus'
2159-
def snpcap = snpcapture_bed.getName() != 'nf-core_eager_dummy.txt' ? "--refseq ${pmdtools_reference_mask} --basecomposition" : ''
2160-
if ( snpcapture_bed.getName() != 'nf-core_eager_dummy.txt' && !params.pmdtools_reference_mask ) { log.info "[nf-core/eager] warn: No reference mask specified for PMDtools, therefore ignoring that for downstream analysis!" }
21612186
def size = params.large_ref ? '-c' : ''
21622187
def platypus = params.pmdtools_platypus ? '--platypus' : ''
21632188
"""
21642189
#Run Filtering step
2165-
samtools calmd ${bam} ${fasta} | pmdtools --threshold ${params.pmdtools_threshold} ${treatment} ${snpcap} --header | samtools view -Sb - > "${libraryid}".pmd.bam
2190+
samtools calmd ${bam} ${fasta} | pmdtools --threshold ${params.pmdtools_threshold} ${treatment} --header | samtools view -Sb - > "${libraryid}".pmd.bam
21662191
21672192
#Run Calc Range step
21682193
## To allow early shut off of pipe: https://github.com/nextflow-io/nextflow/issues/1564
21692194
trap 'if [[ \$? == 141 ]]; then echo "Shutting samtools early due to -n parameter" && samtools index ${libraryid}.pmd.bam ${size}; exit 0; fi' EXIT
2170-
samtools calmd ${bam} ${fasta} | pmdtools --deamination ${platypus} --range ${params.pmdtools_range} ${treatment} ${snpcap} -n ${params.pmdtools_max_reads} > "${libraryid}".cpg.range."${params.pmdtools_range}".txt
2195+
samtools calmd ${bam} ${fasta} | pmdtools --deamination ${platypus} --range ${params.pmdtools_range} ${treatment} -n ${params.pmdtools_max_reads} > "${libraryid}".cpg.range."${params.pmdtools_range}".txt
21712196
21722197
samtools index ${libraryid}.pmd.bam ${size}
21732198
"""

nextflow_schema.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -889,9 +889,9 @@
889889
},
890890
"pmdtools_reference_mask": {
891891
"type": "string",
892-
"description": "Specify a path to reference mask for PMDTools.",
892+
"description": "Specify a bedfile to be used to mask the reference fasta prior to running pmdtools.",
893893
"fa_icon": "fas fa-mask",
894-
"help_text": "Can be used to set a path to a reference genome mask for PMDTools."
894+
"help_text": "Activates masking of the reference fasta prior to running pmdtools. Positions that are in the provided bedfile will be replaced by Ns in the reference genome. This is useful for capture data, where you might not want the allele of a SNP to be counted as damage when it is a transition. Masking of the reference is done using `bedtools maskfasta`."
895895
},
896896
"pmdtools_max_reads": {
897897
"type": "integer",

0 commit comments

Comments
 (0)