Skip to content

BWA step fails on large genomes, because samtools index requires the "-c" (CSI index) flag #146

@lucventurini

Description

@lucventurini

Observed while trialing the pipeline together with @hesterjvs on H. vulgare, with its genome over 4GBs.
The fix would be to add a "-c" flag to the samtools index command. However, this will result in the BAM file having a .csi, rather than .bai, index.
The analysis was done using the latest dev branch, with the Docker profile (revision: 435812b [dev]).

Key part of the log output:

ERROR ~ Error executing process > 'bwa (Sample_5_S5_L002_R1_001)'

Caused by:
  Process `bwa (Sample_5_S5_L002_R1_001)` terminated with an error exit status (1)

Command executed:

  bwa aln -t 8 bwa_index/*.fasta Sample_5_S5_L002_R1_001.combined.prefixed.fq.gz -n 0.04 -l 32 -k 2 -f "Sample_5_S5_L002_R1_001.combined.prefixed.fq.sai"
  bwa samse -r "@RG\tID:ILLUMINA-Sample_5_S5_L002_R1_001\tSM:Sample_5_S5_L002_R1_001\tPL:illumina" bwa_index/*.fasta "Sample_5_S5_L002_R1_001.combined.prefixed.fq".sai Sample_5_S5_L002_R1_001.combined.prefixed.fq.gz | samtools sort -@ 8 -O bam - > "Sample_5_S5_L002_R1_001".sorted.bam
  samtools index "Sample_5_S5_L002_R1_001".sorted.bam

Command exit status:
  1

Command output:
  (empty)

Command error:
  [bwa_aln] 17bp reads: max_diff = 2
  [bwa_aln] 38bp reads: max_diff = 3
  [bwa_aln] 64bp reads: max_diff = 4
  [bwa_aln] 93bp reads: max_diff = 5
  [bwa_aln] 124bp reads: max_diff = 6
  [bwa_aln] 157bp reads: max_diff = 7
  [bwa_aln] 190bp reads: max_diff = 8
  [bwa_aln] 225bp reads: max_diff = 9
  [bwa_aln_core] calculate SA coordinate... 37.53 sec
  [bwa_aln_core] write to the disk... 0.00 sec
  [bwa_aln_core] 7289 sequences have been processed.
  [main] Version: 0.7.17-r1188
  [main] CMD: bwa aln -t 8 -n 0.04 -l 32 -k 2 -f Sample_5_S5_L002_R1_001.combined.prefixed.fq.sai bwa_index/Hordeum_vulgare.IBSC_v2.dna.toplevel.fasta Sample_5_S5_L002_R1_001.combined.prefixed.fq.gz
  [main] Real time: 124.051 sec; CPU: 41.598 sec
  [bwa_aln_core] convert to sequence coordinate... 4.41 sec
  [bwa_aln_core] refine gapped alignments... 0.91 sec
  [bwa_aln_core] print alignments... 0.01 sec
  [bwa_aln_core] 7289 sequences have been processed.
  [main] Version: 0.7.17-r1188
  [main] CMD: bwa samse -r @RG\tID:ILLUMINA-Sample_5_S5_L002_R1_001\tSM:Sample_5_S5_L002_R1_001\tPL:illumina bwa_index/Hordeum_vulgare.IBSC_v2.dna.toplevel.fasta Sample_5_S5_L002_R1_001.combined.prefixed.fq.sai Sample_5_S5_L002_R1_001.combined.prefixed.fq.gz
  [main] Real time: 18.397 sec; CPU: 5.824 sec
  [bam_sort_core] merging from 0 files and 8 in-memory blocks...
  [E::hts_idx_push] Region 543480053..543480129 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
  samtools index: failed to create index for "Sample_5_S5_L002_R1_001.sorted.bam": Numerical result out of range

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions