Skip to content

PMDTools Error #324

@jfy133

Description

@jfy133

Describe the bug
During a stress test of nf-core/eager -r dev (on 2019-12-20), where I ran a whole HiSeq lane of random stuff mapping to hs37d5, I encountered the following error.

Process `pmdtools (<THE_SAMPLE>_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup)` terminated with an error exit status (1)
Command executed:
  #Run Filtering step
  samtools calmd -b <THE_SAMPLE>_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup.bam hs37d5.fa | samtools view -h - | pmdtools --threshold 3 --UDGhalf  --header | samtools view -@ 2 -Sb - > "<THE_SAMPLE>_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup".pmd.bam
  #Run Calc Range step
  samtools calmd -b <THE_SAMPLE>_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup.bam hs37d5.fa | samtools view -h - | pmdtools --deamination --range 10 --UDGhalf  -n 10000 > "<THE_SAMPLE>_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup".cpg.range."10".txt
  samtools index "-c" <THE_SAMPLE>_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup.pmd.bam
Command exit status:
  1
Command output:
  (empty)
Command error:
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2206:10673:12216': '54C0' -> '54n0'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:2122:29894:3635': 0 -> 1
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2122:29894:3635': '33' -> '0n32'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1104:4848:3565': '0C0A74' -> '0n0n74'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1219:5355:27514': '46T0' -> '46n0'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:1116:9942:23856': 0 -> 1
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1116:9942:23856': '45' -> '43Y1'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:2212:29183:27443': 1 -> 2
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2212:29183:27443': '51C24' -> '0y50c24'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1110:22130:12058': '38C20' -> '38R20'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1215:21734:28393': '35T31G0' -> '35K31G0'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:2105:23541:34090': 0 -> 1
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2105:23541:34090': '39' -> '30r8'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2227:32522:16805': '20A9' -> '20y9'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:1120:15696:31189': 1 -> 2
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1120:15696:31189': '62G2' -> '55w6g2'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:2123:3468:12743': 0 -> 1
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2123:3468:12743': '49' -> '13k35'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1107:10165:16014': '10C38' -> '10M38'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2124:17147:39682': '7A0C67' -> '7a0m67'
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:2112:29254:7767': '14A16A44' -> '14s16k44'
  [bam_fillmd1] different NM for read 'K00233:138:HFMK7BBXY:1:1208:31040:28340': 1 -> 3
  [bam_fillmd1] different MD for read 'K00233:138:HFMK7BBXY:1:1208:31040:28340': '8C67' -> '8c51w9k5'
  Traceback (most recent call last):
    File "/projects1/users/fellows/nextflow/nf_cache/conda/nf-core-eager-2.1.0dev-c854b839ac015c281c3696d3b5072a73/bin/pmdtools", line 854, in <module>
      if real_ref_seq[i+1] != 'G' and i != 0: continue
  IndexError: string index out of range
Work dir:
  /projects1/users/fellows/nextflow/eager2/eager2_publication/resource_estimation/output/hiseq_se/work/b3/3752c89b0a9959722d48d65491de75
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Closer inspection and step-by-step manually running of the commands identified the failure to occur at:

samtools calmd -b PIE038.A0101.SG1.1_S0_L001_R1_001.fastq.truncated.mapped.bam.filtered_rmdup.bam hs37d5.fa | samtools view -h - | pmdtools --threshold 3 --UDGhalf  --header

I'm not familiar with pmdtools (not yet needed for my own research) @TCLamnidis @pontussk, do you have any ideas?

To Reproduce

nextflow run nf-core/eager -r dev \
-c nextflow_custom.config -profile shh_custom,conda \
--reads '../../../resource_estimation/input/hiseq_se/*/PIE*{R1}*.fastq.gz' \
--singleEnd \
--fasta '/projects1/Reference_Genomes/Human/hs37d5/hs37d5.fa' \
--fasta_index '/projects1/Reference_Genomes/Human/hs37d5/hs37d5.fa.fai' \
--bwa_index '/projects1/Reference_Genomes/Human/hs37d5/hs37d5.fa' \
--seq_dict '/projects1/Reference_Genomes/Human/hs37d5/hs37d5.dict' \
--outdir '../../../resource_estimation/output/hiseq_se/' \
-w '../../../resource_estimation/output/hiseq_se/work/' \
--run_bam_filtering --bam_mapping_quality_threshold 20 \
--bam_discard_unmapped \
--bam_unmapped_type 'fastq' \
--run_pmdtools \
--trim_bam

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds upstream fixNeeds a fix in the upstream tool project

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions