Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
6ce647d
Add cram support, read splitting
FriederikeHanssen Jun 15, 2021
21a695a
Add estimate library complexity if spark is used
FriederikeHanssen Jun 16, 2021
a634a79
Fixes resume problem, but is losing the file name...
FriederikeHanssen Jun 16, 2021
1632d4b
Add tmp dir to gatk processes so tmp files are written to the proper …
FriederikeHanssen Jun 16, 2021
ab0c1c8
Fix filename display + resume for TABIX
FriederikeHanssen Jun 16, 2021
e8290a1
Try to get spark to work
FriederikeHanssen Jun 16, 2021
809d321
Add MDSpark back in
FriederikeHanssen Jun 16, 2021
6ba8720
try with runoptions
FriederikeHanssen Jun 16, 2021
46dcdc2
The newest gatk container is not working for me with spark, 4.1.9.0 is
FriederikeHanssen Jun 16, 2021
41649cc
Add docker.userEmulation back in
FriederikeHanssen Jun 16, 2021
b4dd4ca
Add more spark things
FriederikeHanssen Jun 21, 2021
147d6b4
Fix module params
FriederikeHanssen Jun 21, 2021
53d69fa
Publish ref
FriederikeHanssen Jun 21, 2021
5b4fc53
try with path instead of file
FriederikeHanssen Jun 21, 2021
1158010
try with path instead of file
FriederikeHanssen Jun 21, 2021
9b1768f
try with fromFile instead
FriederikeHanssen Jun 21, 2021
6323fbe
file
FriederikeHanssen Jun 21, 2021
838932f
whole fixownership seems to work
FriederikeHanssen Jun 21, 2021
dd4f783
Merge remote-tracking branch 'upstream/dsl2' into dsl2
FriederikeHanssen Jun 21, 2021
39403ec
Add numLanes to meta sheet to deal with blocked mapping output channels
FriederikeHanssen Jun 22, 2021
a62acb1
Add channel dumping to check for missing id
FriederikeHanssen Jun 22, 2021
cd51cbc
use groupKey instead
FriederikeHanssen Jun 22, 2021
21c168e
not sure if this works, but run a bigger test for this
FriederikeHanssen Jun 22, 2021
d4354d6
Simplify mapping epression
FriederikeHanssen Jun 23, 2021
26448a6
remove unused sw & add ref to samtools stats
FriederikeHanssen Jun 23, 2021
87b6831
Add skip_fastqc in again
FriederikeHanssen Jun 23, 2021
5184462
try with exporting ref path and cache
FriederikeHanssen Jun 23, 2021
46effa4
Remove quotes
FriederikeHanssen Jun 23, 2021
70d28c0
add mutect2 somatic module
FriederikeHanssen Jun 23, 2021
a920909
Try to circumvent stats issues with view
FriederikeHanssen Jun 23, 2021
2047dc8
Add ref to cram merge
FriederikeHanssen Jun 24, 2021
112aaf0
Add ref to cram merge
FriederikeHanssen Jun 24, 2021
52632e6
Use double quotes for output
FriederikeHanssen Jun 24, 2021
144fef7
Add ref to stats
FriederikeHanssen Jun 24, 2021
44c5bdf
fix logic for bam to cram conversion
FriederikeHanssen Jun 24, 2021
1d5b395
Simplify if
FriederikeHanssen Jun 24, 2021
6a77b3c
Add memory overhead for gatk based tools
FriederikeHanssen Jun 24, 2021
5ab6078
Add mutect2 somatic
FriederikeHanssen Jun 24, 2021
e1e0d34
add conf
FriederikeHanssen Jun 25, 2021
dbe4bdb
remove failing dumo statement
FriederikeHanssen Jun 25, 2021
951b78d
select spark tools
FriederikeHanssen Jun 25, 2021
f08901f
change use_gatk_spark in bwamem2
FriederikeHanssen Jun 26, 2021
d44e522
change use_gatk_spark in bwamem2
FriederikeHanssen Jun 26, 2021
dda8b13
add dump tag to figure out why bqsr is not working
FriederikeHanssen Jun 27, 2021
db3d98a
try withotu clone to get to work on aws
FriederikeHanssen Jun 27, 2021
287146f
remove meta.id
FriederikeHanssen Jun 27, 2021
2732291
USe channels for known sites
FriederikeHanssen Jun 27, 2021
c398e01
Try to fix known_sites channel
FriederikeHanssen Jun 27, 2021
a665e66
add groupTuple back in
FriederikeHanssen Jun 27, 2021
bccdb7d
change dbsnp/knownindels channel
FriederikeHanssen Jun 27, 2021
118f9c1
fix multiple knwonindels input
FriederikeHanssen Jun 27, 2021
c2e73c6
add dump statements, why are the intervals not working for humans
FriederikeHanssen Jun 27, 2021
d5cec16
sth of when providing multiple indices
FriederikeHanssen Jun 27, 2021
777f7a2
add tbi back in
FriederikeHanssen Jun 27, 2021
286bbe3
concat seems to fix this channel madness
FriederikeHanssen Jun 27, 2021
3fd8078
hardcode number of intervals for tests
FriederikeHanssen Jun 27, 2021
493a147
fix docker image tag, can't find singularity one
FriederikeHanssen Jun 27, 2021
dd88aa3
add haplotypecalelr back in
FriederikeHanssen Jun 27, 2021
ec7a2d5
count num intervals with map oprator
FriederikeHanssen Jun 28, 2021
1be7416
collect dbsnp tbi to avoid consumption of channel
FriederikeHanssen Jun 28, 2021
206db89
add gvcf back in
FriederikeHanssen Jun 28, 2021
9ce70a0
Add bamqc after bqsr with crams
FriederikeHanssen Jun 29, 2021
047e5cf
Use docker image for htslib + singularity
FriederikeHanssen Jun 29, 2021
19c911a
add dbsnp back in
FriederikeHanssen Jun 29, 2021
2342971
add dbsnp back in
FriederikeHanssen Jun 29, 2021
5e8f2cc
Resolve merge conflicts
FriederikeHanssen Jun 29, 2021
4f974f6
add step/tools to indices wf
FriederikeHanssen Jun 29, 2021
d766d44
Resolve remaining merge conflicts/fix problems
FriederikeHanssen Jun 29, 2021
3a37778
add try/catch to figure out why module conf is not loaded
FriederikeHanssen Jun 29, 2021
22d55ab
Split by num reads instead of parts to generate similar sized files
FriederikeHanssen Jul 2, 2021
3f26535
Code clean up
FriederikeHanssen Jul 15, 2021
069a4f1
Fix merge conflicts
FriederikeHanssen Jul 15, 2021
1866022
apply suggestions from code review
FriederikeHanssen Jul 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 48 additions & 17 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ params {
publish_files = false
}
// MAPPING
'seqkit_split2' {
args = "--by-size ${params.split_fastq}"
publish_files = false
}
'bwa_mem1_mem' {
args = '-K 100000000 -M'
args2 = 'sort'
Expand All @@ -77,6 +81,28 @@ params {
args2 = 'sort'
publish_files = false
}
// MARKDUPLICATES
'markduplicates' {
args = 'REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT'
suffix = '.md'
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = false
}
'markduplicatesspark' {
args = '--remove-sequencing-duplicates false -VS LENIENT'
suffix = '.md'
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = ['cram': 'markduplicates', 'crai': 'markduplicates']
}
'estimatelibrarycomplexity' {
args = ''
suffix = '.md'
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = ['metrics': 'markduplicates']
}
'merge_bam_mapping' {
publish_by_meta = true
publish_files = ['bam':'mapped']
Expand All @@ -87,36 +113,32 @@ params {
publish_by_meta = true
publish_dir = 'reports/qualimap'
}
'samtools_index_mapping' {
publish_by_meta = true
publish_files = ['bai':'mapped']
publish_dir = 'preprocessing'
}
'samtools_stats_mapping' {
publish_by_meta = true
publish_dir = 'reports/samtools_stats'
}
// MARKDUPLICATES
'markduplicates' {
args = 'REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT'
'samtools_view' {
suffix = '.md'
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = ['bam': 'markduplicates', 'bai': 'markduplicates']
publish_files = ['cram': 'markduplicates', 'crai': 'markduplicates']
}
'markduplicatesspark' {
args = '--remove-sequencing-duplicates false -VS LENIENT'
suffix = '.md'
'samtools_index_cram' {
publish_by_meta = true
publish_files = ['crai':'sth']
publish_dir = 'preprocessing'
publish_files = ['bam': 'markduplicates', 'bai': 'markduplicates']
}
// PREPARE_RECALIBRATION
'baserecalibrator' {
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = ['recal.table': 'recal_table']
}
'baserecalibrator_spark' {
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = ['recal.table': 'recal_table']
}
'gatherbqsrreports' {
publish_by_meta = true
publish_dir = 'preprocessing'
Expand All @@ -127,10 +149,14 @@ params {
suffix = '.recal'
publish_files = false
}
'merge_bam_recalibrate' {
'applybqsr_spark' {
suffix = '.recal'
publish_files = false
}
'merge_cram_recalibrate' {
suffix = '.recal'
publish_by_meta = true
publish_files = ['bam':'recalibrated']
publish_files = ['cram':'recalibrated']
publish_dir = 'preprocessing'
}
'qualimap_bamqc_recalibrate' {
Expand All @@ -142,7 +168,7 @@ params {
suffix = 'recal'
publish_by_meta = true
publish_dir = 'preprocessing'
publish_files = ['recal.bam':'recalibrated', 'recal.bam.bai':'recalibrated']
publish_files = ['recal.cram':'recalibrated', 'recal.cram.crai':'recalibrated']
}
'samtools_stats_recalibrate' {
publish_by_meta = true
Expand Down Expand Up @@ -186,7 +212,7 @@ params {
}

// TUMOR_VARIANT_CALLING

//
// PAIR_VARIANT_CALLING
'manta_somatic' {
publish_by_meta = true
Expand All @@ -208,6 +234,11 @@ params {
publish_dir = 'variant_calling'
publish_files = ['vcf.gz':'strelka', 'vcf.gz.tbi':'strelka']
}
'mutect2_somatic' {
publish_by_meta = true
publish_dir = 'variant_calling'
publish_files = ['vcf.gz':'mutect2', 'vcf.gz.tbi':'mutect2']
}
// ANNOTATE
'snpeff' {
args = '-nodownload -canon -v'
Expand Down
4 changes: 2 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ profiles {
params.save_bam_mapped = true
}
split_fastq {
params.split_fastq = 500
params.split_fastq = 2
}
targeted {
params.target_bed = 'https://raw.githubusercontent.com/nf-core/test-datasets/sarek/testdata/target.bed'
Expand All @@ -67,7 +67,7 @@ profiles {
params.trim_fastq = true
}
use_gatk_spark {
params.use_gatk_spark = true
params.use_gatk_spark = 'markduplicates,bqsr'
}
umi_quiaseq {
params.genome = 'smallGRCh38'
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ This is _not_ recommended.
* Specify `--use_gatk_spark`
* `test_split_fastq`
* A profile with a complete configuration for automated testing
* Specify `--split_fastq 500`
* Specify `--split_fastq 2`
* `test_targeted`
* A profile with a complete configuration for automated testing
* Include link to a target `BED` file and use `Manta` and `Strelka` for Variant Calling
Expand Down
4 changes: 3 additions & 1 deletion modules/local/concat_vcf/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@ process CONCAT_VCF {

conda (params.enable_conda ? "bioconda::htslib=1.12" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/htslib:1.12--hd3b49d5_0"
//TODO: No singularity container at the moment, use docker container for the moment
container "quay.io/biocontainers/htslib:1.12--h9093b5e_1"
} else {
container "quay.io/biocontainers/htslib:1.12--hd3b49d5_0"
}


input:
tuple val(meta), path(vcf)
path fai
Expand Down
33 changes: 33 additions & 0 deletions modules/local/index_target_bed/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName } from './functions'

params.options = [:]
options = initOptions(params.options)

process INDEX_TARGET_BED {
tag "$target_bed"
label 'process_medium'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) }

conda (params.enable_conda ? "bioconda::htslib=1.12" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
//TODO: No singularity container at the moment, use docker container for the moment
container "quay.io/biocontainers/htslib:1.12--h9093b5e_1"
} else {
container "quay.io/biocontainers/htslib:1.12--hd3b49d5_0"
}

input:
path target_bed

output:
tuple path("${target_bed}.gz"), path("${target_bed}.gz.tbi")

script:
"""
bgzip --threads ${task.cpus} -c ${target_bed} > ${target_bed}.gz
tabix ${target_bed}.gz
"""
}
Comment on lines +1 to +33
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that in nf-core/modules

11 changes: 9 additions & 2 deletions modules/nf-core/software/bwa/mem/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,15 @@ process BWA_MEM {
script:
def split_cpus = Math.floor(task.cpus/2)
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
def part = params.split_fastq > 1 ? reads.get(0).name.findAll(/part_([0-9]+)?/).last().concat('.') : ""
def prefix = options.suffix ? "${meta.id}${options.suffix}.${part}" : "${meta.id}.${part}"
def read_group = meta.read_group ? "-R ${meta.read_group}" : ""

//MD Spark NEEDS name sorted reads or runtime goes through the roof.
//However, if duplicate marking is skipped, reads need to be coordinate sorted.
//Spark can be used also for BQSR, therefore check for both: only name sort if spark + duplicate marking is done
def sort_order = ('markduplicates' in params.use_gatk_spark) & !params.skip_markduplicates ? "-n" : ""

"""
INDEX=`find -L ./ -name "*.amb" | sed 's/.amb//'`

Expand All @@ -40,7 +47,7 @@ process BWA_MEM {
-t ${split_cpus} \\
\$INDEX \\
$reads \\
| samtools $options.args2 --threads ${split_cpus} -o ${prefix}.bam -
| samtools $options.args2 $sort_order --threads ${split_cpus} -o ${prefix}bam -

echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//' > ${software}.version.txt
"""
Expand Down
11 changes: 9 additions & 2 deletions modules/nf-core/software/bwamem2/mem/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,15 @@ process BWAMEM2_MEM {
script:
def split_cpus = Math.floor(task.cpus/2)
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
def part = params.split_fastq > 1 ? reads.get(0).name.findAll(/part_([0-9]+)?/).last().concat('.') : ""
def prefix = options.suffix ? "${meta.id}${options.suffix}.${part}" : "${meta.id}.${part}"
def read_group = meta.read_group ? "-R ${meta.read_group}" : ""

//MD Spark NEEDS name sorted reads or runtime goes through the roof.
//However, if duplicate marking is skipped, reads need to be coordinate sorted.
//Spark can be used also for BQSR, therefore check for both: only name sort if spark + duplicate marking is done
def sort_order = ('markduplicates' in params.use_gatk_spark) & !params.skip_markduplicates ? "-n" : ""

"""
INDEX=`find -L ./ -name "*.amb" | sed 's/.amb//'`

Expand All @@ -40,7 +47,7 @@ process BWAMEM2_MEM {
-t ${split_cpus} \\
\$INDEX \\
$reads \\
| samtools $options.args2 -@ ${split_cpus} -o ${prefix}.bam -
| samtools $options.args2 $sort_order -@ ${split_cpus} -o ${prefix}bam -

echo \$(bwa-mem2 version 2>&1) > ${software}.version.txt
"""
Expand Down
38 changes: 38 additions & 0 deletions modules/nf-core/software/freebayes/freebayes.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName } from './functions'

params.options = [:]
options = initOptions(params.options)

process FREEBAYES {
tag "$meta.id"
label 'process_low'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }

conda (params.enable_conda ? "bioconda::freebayes=1.3.5" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/freebayes:1.3.5--py38ha193a2f_3"
} else {
container "quay.io/biocontainers/freebayes:1.3.5--py38ha193a2f_3"
}

input:
tuple val(meta), path(cram), path(crai)

output:
// TODO nf-core: Named file extensions MUST be emitted for ALL output channels
tuple val(meta), path("*.bam"), emit: bam
// TODO nf-core: List additional required output channels/values here
path "*.version.txt" , emit: version

script:
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
"""


echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' > ${software}.version.txt
"""
}
68 changes: 68 additions & 0 deletions modules/nf-core/software/freebayes/functions.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
//
// Utility functions used in nf-core DSL2 module files
//

//
// Extract name of software tool from process name using $task.process
//
def getSoftwareName(task_process) {
return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
}

//
// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
//
def initOptions(Map args) {
def Map options = [:]
options.args = args.args ?: ''
options.args2 = args.args2 ?: ''
options.args3 = args.args3 ?: ''
options.publish_by_meta = args.publish_by_meta ?: []
options.publish_dir = args.publish_dir ?: ''
options.publish_files = args.publish_files
options.suffix = args.suffix ?: ''
return options
}

//
// Tidy up and join elements of a list to return a path string
//
def getPathFromList(path_list) {
def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries
paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
return paths.join('/')
}

//
// Function to save/publish module results
//
def saveFiles(Map args) {
if (!args.filename.endsWith('.version.txt')) {
def ioptions = initOptions(args.options)
def path_list = [ ioptions.publish_dir ?: args.publish_dir ]
if (ioptions.publish_by_meta) {
def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
for (key in key_list) {
if (args.meta && key instanceof String) {
def path = key
if (args.meta.containsKey(key)) {
path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
}
path = path instanceof String ? path : ''
path_list.add(path)
}
}
}
if (ioptions.publish_files instanceof Map) {
for (ext in ioptions.publish_files) {
if (args.filename.endsWith(ext.key)) {
def ext_list = path_list.collect()
ext_list.add(ext.value)
return "${getPathFromList(ext_list)}/$args.filename"
}
}
} else if (ioptions.publish_files == null) {
return "${getPathFromList(path_list)}/$args.filename"
}
}
}
Loading