use a meta map#256
Conversation
|
Thanks @maxulysse! Will have a proper look at this tomorrow at some point 👍 |
FriederikeHanssen
left a comment
There was a problem hiding this comment.
Love how readable everything is now 😍
| fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz | ||
| [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz | ||
| [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz | ||
| fastqc --threads ${task.cpus} ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz |
There was a problem hiding this comment.
Out of curiosity: What does [! -f ...] do?
There was a problem hiding this comment.
No idea, just copied it over from chipseq.
I figured it would be easier to update from the nf-core modules from there
There was a problem hiding this comment.
Its bash notation for checking if a file doesnt exist 🙂
| ${extra} \ | ||
| -t ${task.cpus} \ | ||
| ${fasta} ${reads} | \ | ||
| samtools sort --threads ${task.cpus} -m 2G - > ${meta.id}.bam |
There was a problem hiding this comment.
Why is the memory hardcoded here?
There was a problem hiding this comment.
Did not paid attention enough there, I guess I just copied it over from the current sarek dev, I can set it back the way it was
drpatelh
left a comment
There was a problem hiding this comment.
Mainly some reorganisation of the module structure and consistency with using module options and syntax for reusability and flexibility.
| def bwamem2_mem_options = [:] | ||
|
|
||
| bwamem2_mem_options.args_bwamem2 = "-K 100000000 -M" |
There was a problem hiding this comment.
These options can be pre-defined in a map in conf/modules.config like here, included via nextflow.config like here and then you can even append parameters like here in the main script if required.
Hopefully, this means the software parameters are easier to pass around the script and are more customisable by the developed/user. Also, means you dont have to initialise maps all over the place for the module settings because this is already explicitly done in modules.config.
We should stick the same notation to access variable in the module files though i.e. the 5 I have needed so far are listed here
| label 'process_high' | ||
|
|
||
| publishDir "${params.outdir}/bwamem2_mem", mode: 'copy' | ||
| publishDir "${params.outdir}/bwamem2/${meta.sample}", |
There was a problem hiding this comment.
You should remove any customisation from this code in terms of output directories as this should be customisable from the opts map that comes into the module. The logic still needs a little work but for now this is the generic code I am using here
| script: | ||
| CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : "" | ||
| readGroup = "@RG\\tID:${run}\\t${CN}PU:${run}\\tSM:${sample}\\tLB:${sample}\\tPL:${params.sequencer}" | ||
| readGroup = "@RG\\tID:${meta.run}\\t${CN}PU:${meta.run}\\tSM:${meta.sample}\\tLB:${meta.sample}\\tPL:ILLUMINA" |
There was a problem hiding this comment.
This should come in via the meta parameter because not everyone will want to create the read group in this way because they wont have all of the same values in map e.g. see here
| tuple val(meta), path("*.bam"), path("*.bai") | ||
|
|
||
| script: | ||
| CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : "" |
There was a problem hiding this comment.
| CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : "" |
There was a problem hiding this comment.
This parameter should come in via arguments created from the pipeline and not hardcoded here. Not everyone will use this!
| samtools sort --threads ${task.cpus} -m 2G - > ${sample}_${run}.bam | ||
| samtools index ${sample}_${run}.bam | ||
| bwa-mem2 mem \ | ||
| ${options.args_bwamem2} \ |
There was a problem hiding this comment.
| ${options.args_bwamem2} \ | |
| $opts.args \ |
| ${fasta} ${reads} | \ | ||
| samtools sort --threads ${task.cpus} -m 2G - > ${meta.id}.bam | ||
|
|
||
| samtools index ${meta.id}.bam |
There was a problem hiding this comment.
Will need to allow for suffixes too in order to allow for naming the bam files differently if required. This would apply to other modules too e.g. here
| fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz | ||
| [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz | ||
| [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz | ||
| fastqc --threads ${task.cpus} ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz |
There was a problem hiding this comment.
Its bash notation for checking if a file doesnt exist 🙂
|
for |
|
I am not too keen on using a mixture of |
|
name of the tool is now |
|
Ah, I see. Good point 👍 If we want to split on the |
|
that's a good point |
Using a meta map à la @drpatelh
nf-core/sarek pull request
Many thanks for contributing to nf-core/sarek!
Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).
PR checklist
nextflow run . -profile test,docker).nf-core lint .).docsis updatedCHANGELOG.mdis updatedREADME.mdis updatedLearn more about contributing: CONTRIBUTING.md