Check Documentation
I have checked the following places for your error:
Description of the bug
I used Sarek to generate gVCF files with the tool HaplotypeCaller, and then planned to do joint genotyping myself on all samples together. As this is exome sequencing, I first used the option --target_bed, but realised that this results in lots of missing genotypes. The reason is that "bcftools isec" apparently is run on the gvcf files, which removes all regions where the start of a non-variant block in the gvcf is not within the regions listed in the bed file. This means that many of the regions with reference alleles are removed from the file, even if parts of these blocks are indeed covered by the bed (bcftools does not look at the END tag). The vcf files generated for each sample are fine though.
Steps to reproduce
Command line:
nextflow run ~/sarek/main.nf -profile uppmax,singularity -with-singularity /sw/data/ToolBox/nf-core/nfcore-sarek-2.6.1.img --containerPath ~/sarek/containers --custom_config_base ~/configs-master/ --genome_base /sw/data/ToolBox/hg38bundle/ --project XXX --genome GRCh38 --step prepare_recalibration --target_bed Twist_Exome_RefSeq_targets_hg38.bed --input mapped_bam_files.tsv
Expected behaviour
Even if it might be better to do the joint genotyping on the full file anyway, I would expect the gvcf files generated to include (at least) the regions in the given bed file when using --target_bed. Or maybe just a note/warning on this in the description of --target_bed?
Log files
Have you provided the following extra information/files:
System
- Hardware: HPC
- Executor: slurm
- Sarek version: 2.6.1
Nextflow Installation
Container engine
Check Documentation
I have checked the following places for your error:
Description of the bug
I used Sarek to generate gVCF files with the tool HaplotypeCaller, and then planned to do joint genotyping myself on all samples together. As this is exome sequencing, I first used the option --target_bed, but realised that this results in lots of missing genotypes. The reason is that "bcftools isec" apparently is run on the gvcf files, which removes all regions where the start of a non-variant block in the gvcf is not within the regions listed in the bed file. This means that many of the regions with reference alleles are removed from the file, even if parts of these blocks are indeed covered by the bed (bcftools does not look at the END tag). The vcf files generated for each sample are fine though.
Steps to reproduce
Command line:
nextflow run ~/sarek/main.nf -profile uppmax,singularity -with-singularity /sw/data/ToolBox/nf-core/nfcore-sarek-2.6.1.img --containerPath ~/sarek/containers --custom_config_base ~/configs-master/ --genome_base /sw/data/ToolBox/hg38bundle/ --project XXX --genome GRCh38 --step prepare_recalibration --target_bed Twist_Exome_RefSeq_targets_hg38.bed --input mapped_bam_files.tsvExpected behaviour
Even if it might be better to do the joint genotyping on the full file anyway, I would expect the gvcf files generated to include (at least) the regions in the given bed file when using --target_bed. Or maybe just a note/warning on this in the description of --target_bed?
Log files
Have you provided the following extra information/files:
.nextflow.logfileSystem
Nextflow Installation
Container engine