Improve BWA index workflow

Currently we by default unzip the FASTA file 

https://github.com/nf-core/eager/blob/f2a326d1b14a3d2b1664c604eb33bbc9251e4077/main.nf#L280-L300

however, both `bwa index` and `picard CreateSequenceDictinary` _does_ allow indexing of gzipped files. This can be confusing for people who have already BWA indexed their genomes while gzipped, and this then gets out of sync with the nf-core/eager bwa mapping command where it has unzipped the FASTA (so now doesn't have `.gz`) but then the actual supplied indicies _do_ have the `.gz` suffix.

It would make more intuitive sense that we skip the whole unzipping thing unless we are running `samtools faidx`. However we need to check if this may affect downstream analysis.

In the meantime we should document this properly.

Originally reported by @marcel-keller

	if("${params.fasta}".endsWith(".gz")){
	process unzip_reference{
	tag "${zipped_fasta}"

	input:
	path zipped_fasta from file(params.fasta) // path doesn't like it if a string of an object is not prefaced with a root dir (/), so use file() to resolve string before parsing to `path`

	output:
	path "$unzip" into ch_fasta into ch_fasta_for_bwaindex,ch_fasta_for_bt2index,ch_fasta_for_faidx,ch_fasta_for_seqdict,ch_fasta_for_circulargenerator,ch_fasta_for_circularmapper,ch_fasta_for_damageprofiler,ch_fasta_for_qualimap,ch_fasta_for_pmdtools,ch_fasta_for_genotyping_ug,ch_fasta_for_genotyping_hc,ch_fasta_for_genotyping_freebayes,ch_fasta_for_genotyping_pileupcaller,ch_fasta_for_vcf2genome,ch_fasta_for_multivcfanalyzer,ch_fasta_for_genotyping_angsd

	script:
	unzip = zipped_fasta.toString() - '.gz'
	"""
	pigz -f -d -p ${task.cpus} $zipped_fasta
	"""
	}
	} else {
	fasta_for_indexing = Channel
	.fromPath("${params.fasta}", checkIfExists: true)
	.into{ ch_fasta_for_bwaindex; ch_fasta_for_bt2index; ch_fasta_for_faidx; ch_fasta_for_seqdict; ch_fasta_for_circulargenerator; ch_fasta_for_circularmapper; ch_fasta_for_damageprofiler; ch_fasta_for_qualimap; ch_fasta_for_pmdtools; ch_fasta_for_genotyping_ug; ch_fasta__for_genotyping_hc; ch_fasta_for_genotyping_hc; ch_fasta_for_genotyping_freebayes; ch_fasta_for_genotyping_pileupcaller; ch_fasta_for_vcf2genome; ch_fasta_for_multivcfanalyzer;ch_fasta_for_genotyping_angsd }
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve BWA index workflow #596

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve BWA index workflow #596

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions