Currently we by default unzip the FASTA file
|
if("${params.fasta}".endsWith(".gz")){ |
|
process unzip_reference{ |
|
tag "${zipped_fasta}" |
|
|
|
input: |
|
path zipped_fasta from file(params.fasta) // path doesn't like it if a string of an object is not prefaced with a root dir (/), so use file() to resolve string before parsing to `path` |
|
|
|
output: |
|
path "$unzip" into ch_fasta into ch_fasta_for_bwaindex,ch_fasta_for_bt2index,ch_fasta_for_faidx,ch_fasta_for_seqdict,ch_fasta_for_circulargenerator,ch_fasta_for_circularmapper,ch_fasta_for_damageprofiler,ch_fasta_for_qualimap,ch_fasta_for_pmdtools,ch_fasta_for_genotyping_ug,ch_fasta_for_genotyping_hc,ch_fasta_for_genotyping_freebayes,ch_fasta_for_genotyping_pileupcaller,ch_fasta_for_vcf2genome,ch_fasta_for_multivcfanalyzer,ch_fasta_for_genotyping_angsd |
|
|
|
script: |
|
unzip = zipped_fasta.toString() - '.gz' |
|
""" |
|
pigz -f -d -p ${task.cpus} $zipped_fasta |
|
""" |
|
} |
|
} else { |
|
fasta_for_indexing = Channel |
|
.fromPath("${params.fasta}", checkIfExists: true) |
|
.into{ ch_fasta_for_bwaindex; ch_fasta_for_bt2index; ch_fasta_for_faidx; ch_fasta_for_seqdict; ch_fasta_for_circulargenerator; ch_fasta_for_circularmapper; ch_fasta_for_damageprofiler; ch_fasta_for_qualimap; ch_fasta_for_pmdtools; ch_fasta_for_genotyping_ug; ch_fasta__for_genotyping_hc; ch_fasta_for_genotyping_hc; ch_fasta_for_genotyping_freebayes; ch_fasta_for_genotyping_pileupcaller; ch_fasta_for_vcf2genome; ch_fasta_for_multivcfanalyzer;ch_fasta_for_genotyping_angsd } |
|
} |
however, both bwa index and picard CreateSequenceDictinary does allow indexing of gzipped files. This can be confusing for people who have already BWA indexed their genomes while gzipped, and this then gets out of sync with the nf-core/eager bwa mapping command where it has unzipped the FASTA (so now doesn't have .gz) but then the actual supplied indicies do have the .gz suffix.
It would make more intuitive sense that we skip the whole unzipping thing unless we are running samtools faidx. However we need to check if this may affect downstream analysis.
In the meantime we should document this properly.
Originally reported by @marcel-keller
Currently we by default unzip the FASTA file
eager/main.nf
Lines 280 to 300 in f2a326d
however, both
bwa indexandpicard CreateSequenceDictinarydoes allow indexing of gzipped files. This can be confusing for people who have already BWA indexed their genomes while gzipped, and this then gets out of sync with the nf-core/eager bwa mapping command where it has unzipped the FASTA (so now doesn't have.gz) but then the actual supplied indicies do have the.gzsuffix.It would make more intuitive sense that we skip the whole unzipping thing unless we are running
samtools faidx. However we need to check if this may affect downstream analysis.In the meantime we should document this properly.
Originally reported by @marcel-keller