Skip to content

EAGER2 cannot run on previous output of an earlier EAGER2 run due to similar file names #149

@jfy133

Description

@jfy133

Describe the bug
It is not possible to run the output of a previous EAGER2 run, such as on the unmapped reads, again through the pipeline e.g.to a different reference.

The pipeline fails because it looks for the strings in the 'prefix' defined below, which if the sample has gone through the pipeline previously, will already have the string earlier in the file name. I presume this is because at the basename step it fails as it the program cannot find the erroneously truncated file name to the earlier generated prefix string.

i.e. SPYOLD_S0_L001_R1_001.fastq.merged.combined.sorted.bam.unmapped.combined

ERROR ~ Error executing process > 'bwa (SPYOLD_S0_L001_R1_001.fastq.merged.combined.sorted.bam.unmapped.combined)' 
 
Caused by: 
  No such property: baseName for class: nextflow.util.BlankSeparatedList 
 
Source block: 
  prefix = reads[0].toString() - ~/(_R1)?(\.combined\.)?(prefixed)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/ 
  fasta = "${index}/*.fasta" 
  """ 
  bwa aln -t ${task.cpus} $fasta $reads -n ${params.bwaalnn} -l ${params.bwaalnl} -k ${params.bwaalnk} -f "${reads.baseName}.sai" 
  bwa samse -r "@RG\\tID:ILLUMINA-${prefix}\\tSM:${prefix}\\tPL:illumina" $fasta "${reads.baseName}".sai $reads | samtools sort -@ ${task.cpus} -O bam - > "${prefix}".sorted.bam 
  samtools index "${prefix}".sorted.bam 
  """ 

To Reproduce
Run a sample through the pipeline (v2.0.5) with the FASTQ having the duplicates of any of the prefix strings defined above (e.g. .combined.final.fastq.gz)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions