Check Documentation
I have checked the following places for your error:
Description of the bug
Read Group information is not added to the bam files generated by bowtie2, which causes GATK ug genotyping to fail.
Steps to reproduce
Steps to reproduce the behaviour:
- Command line:
nextflow run nf-core/eager -r 2.2.2 -params-file bt2local.gatkug.json -c /home/ibar/.nextflow/awoonga.config -resume
Content of bt2local.gatkug.json:
{
"input": "Dingo_aDNA_NF1.tsv",
"fasta": "\/30days\/ibar/\/data\/Dingo\/reference_genome\/CanFam3.1.fasta",
"fasta_index": "\/30days\/ibar\/data\/Dingo\/reference_genome\/CanFam3.1.fasta.fai",
"seq_dict": "\/30days\/ibar\/data\/Dingo\/reference_genome\/CanFam3.1.dict",
"save_reference": "true",
"email": "i.bar@griffith.edu.au",
"skip_fastqc": "true",
"skip_trim": "true",
"mergedonly": "true",
"mapper": "bowtie2",
"hostremoval_mode": "",
"run_bam_filtering": "true",
"bam_mapping_quality_threshold": "10",
"bam_filter_minreadlength": "20",
"dedupper": "dedup",
"dedup_all_merged": "true",
"run_pmdtools": "true",
"run_bedtools_coverage": "true",
"anno_file": "\/30days\/ibar\/data\/Dingo\/reference_genome\/CanFam3.1.gff",
"run_trim_bam": "true",
"bamutils_softclip": "true",
"run_genotyping": "true",
"genotyping_source": "trimmed",
"run_mtnucratio": "true",
"mtnucratio_header": "NC_002008.4"
}
- See error:
Error executing process > 'genotyping_ug (D12)'
Caused by:
Process `genotyping_ug (D12)` terminated with an error exit status (1)
Command executed:
samtools index -b D12.trimmed.bam
gatk3 -T RealignerTargetCreator -R CanFam3.1.fasta -I D12.trimmed.bam -nt 2 -o D12.intervals
gatk3 -T IndelRealigner -R CanFam3.1.fasta -I D12.trimmed.bam -targetIntervals D12.intervals -o D12.realign.bam
gatk3 -T UnifiedGenotyper -R CanFam3.1.fasta -I D12.realign.bam -o D12.unifiedgenotyper.vcf -nt 2 --genotype_likelihoods_model SNP -stand_call_conf 30 --sample_ploidy 2 -dcov 250 --output_mode EMIT_VARIANTS_ONLY
rm D12.realign.{bam,bai}
pigz -p 2 D12.unifiedgenotyper.vcf
Command exit status:
1
Command output:
(empty)
Command error:
INFO 20:30:52,376 HelpFormatter - --------------------------------------------------------------------------------
INFO 20:30:52,380 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
INFO 20:30:52,381 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 20:30:52,381 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 20:30:52,384 HelpFormatter - Program Args: -T RealignerTargetCreator -R CanFam3.1.fasta -I D12.trimmed.bam -nt 2 -o D12.intervals
INFO 20:30:52,390 HelpFormatter - Executing as ibar@aw028.local on Linux 3.10.0-693.5.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_144-b01.
INFO 20:30:52,390 HelpFormatter - Date/Time: 2021/01/10 20:30:52
INFO 20:30:52,391 HelpFormatter - --------------------------------------------------------------------------------
INFO 20:30:52,391 HelpFormatter - --------------------------------------------------------------------------------
INFO 20:30:52,514 GenomeAnalysisEngine - Strictness is SILENT
INFO 20:30:52,772 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 20:30:52,778 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 20:30:52,919 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.14
INFO 20:30:54,823 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: SAM/BAM/CRAM file D12.trimmed.bam is malformed. Please see http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-input-files-for-sequence-read-data-bam-cramfor more information. Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
Expected behaviour
The bam files should have Read Group information to allow for subsequent genotyping with gatk.
Log files
Have you provided the following extra information/files:
System
- Hardware: HPC
- Executor: PBSPro
- OS: RHEL
- Version Linux 3.10.0-693.5.2.el7.x86_64
- Runtime: Groovy 3.0.5 on OpenJDK 64-Bit Server VM 11.0.1+13-LTS
Nextflow Installation
- Version: 20.10.0 build 5430
Container engine
- Engine: Singularity
- version: 3.6.3
- Image tag: nfcore/eager:2.2.2
Additional context
This is the output of picard ValidateSamFile -I D12_PE.mapped.bam:
08:51:19.221 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs1/homes/ibar/.pyenv/versions/miniconda-latest/envs/aDNA/share/picard-2.23.3-0/picard.jar!/com/intel/gkl/native
/libgkl_compression.so
[Mon Jan 11 08:51:19 AEST 2021] ValidateSamFile --INPUT D12_PE.mapped.bam --MODE VERBOSE --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BI
SULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE
_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Mon Jan 11 08:51:19 AEST 2021] Executing as ibar@awoonga1.local on Linux 3.10.0-693.5.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_265-b11; Deflater: Intel; Inflater: Intel; Provider GCS is not
available; Picard version: Version:2.23.3
WARNING 2021-01-11 08:51:19 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
ERROR::POORLY_FORMATTED_HEADER_TAG:File /gpfs1/scratch/30days/ibar/data/Dingo/Dingo_aDNA_NF2_process_10_01_2021/work/2d/c7fc0b43e86023978259162367df06/D12_PE.mapped.bam, Error parsing SAM header. Prob
lem parsing @PG key:value pair. Line:
@PG ID:bowtie2 PN:bowtie2 VN: CL:"/opt/conda/envs/nf-core-eager-2.2.2/bin/bowtie2-align-s --wrapper basic-0 -x reference_genome/CanFam3.1.fasta -p 4 --sensitive-local -1 D12_1.sample
d1M.trimmed.fq.gz -2 D12_2.sampled1M.trimmed.fq.gz"
ERROR::MISSING_READ_GROUP:Read groups is empty
WARNING::RECORD_MISSING_READ_GROUP:Read name ST-E00127:1042:H7TV7CCX2:6:2104:31893:69045, A record is missing a read group
WARNING::RECORD_MISSING_READ_GROUP:Read name ST-E00127:1042:H7TV7CCX2:6:2104:31893:69045, A record is missing a read group
WARNING::RECORD_MISSING_READ_GROUP:Read name ST-E00127:1042:H7TV7CCX2:6:2109:21186:69414, A record is missing a read group
WARNING::RECORD_MISSING_READ_GROUP:Read name ST-E00127:1042:H7TV7CCX2:6:2109:21186:69414, A record is missing a read group
WARNING::RECORD_MISSING_READ_GROUP:Read name ST-E00127:1042:H7TV7CCX2:6:2115:4178:71629, A record is missing a read group
...
Check Documentation
I have checked the following places for your error:
- nf-core/eager FAQ/troubleshooting can be found here
Description of the bug
Read Group information is not added to the
bamfiles generated by bowtie2, which causes GATK ug genotyping to fail.Steps to reproduce
Steps to reproduce the behaviour:
nextflow run nf-core/eager -r 2.2.2 -params-file bt2local.gatkug.json -c /home/ibar/.nextflow/awoonga.config -resumeContent of
bt2local.gatkug.json:Expected behaviour
The
bamfiles should have Read Group information to allow for subsequent genotyping withgatk.Log files
Have you provided the following extra information/files:
.nextflow.logfileDingo_aDNA_NF2.log
System
Nextflow Installation
Container engine
Additional context
This is the output of
picard ValidateSamFile -I D12_PE.mapped.bam: