Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Setup Miniconda
uses: goanpeca/setup-miniconda@v1.0.2
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.7
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Setup Miniconda
uses: goanpeca/setup-miniconda@v1.0.2
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.7
Expand Down
17 changes: 5 additions & 12 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ jobs:
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v1
uses: technote-space/get-diff-action@v4
with:
PREFIX_FILTER: |
FILES: |
Dockerfile
environment.yml

Expand Down Expand Up @@ -142,22 +142,15 @@ jobs:
- name: PMDTOOLS Test PMDtools works alone
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_pmdtools
- name: GATK 3.5 Download resource files
run: |
mkdir -p jars/gatk_3_5
wget https://storage.googleapis.com/gatk-software/package-archive/gatk/GenomeAnalysisTK-3.5-0-g36282e4.tar.bz2 -P jars/gatk_3_5
tar xvf jars/gatk_3_5/GenomeAnalysisTK-3.5-0-g36282e4.tar.bz2 -C jars/gatk_3_5/
chmod +x jars/gatk_3_5/GenomeAnalysisTK.jar
GATK_JAR=$(readlink -f jars/gatk_3_5/GenomeAnalysisTK.jar)
- name: GENOTYPING_UG AND MULTIVCFANALYZER Test running GATK UnifiedGenotyper and MultiVCFAnalyzer, additional VCFS
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --gatk_ug_jar '/home/runner/work/eager/eager/jars/gatk_3_5/GenomeAnalysisTK.jar' --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer --additional_vcf_files 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/vcf/JK2772_CATCAGTGAGTAGA_L008_R1_001.fastq.gz.tengrand.fq.combined.fq.mapped_rmdup.bam.unifiedgenotyper.vcf.gz' --write_allele_frequencies
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer --additional_vcf_files 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/vcf/JK2772_CATCAGTGAGTAGA_L008_R1_001.fastq.gz.tengrand.fq.combined.fq.mapped_rmdup.bam.unifiedgenotyper.vcf.gz' --write_allele_frequencies
- name: COMPLEX LANE/LIBRARY MERGING Test running lane and library merging prior to GATK UnifiedGenotyper and running MultiVCFAnalyzer
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --run_genotyping --gatk_ug_jar '/home/runner/work/eager/eager/jars/gatk_3_5/GenomeAnalysisTK.jar' --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_complex,docker --run_genotyping --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP' --run_multivcfanalyzer
- name: GENOTYPING_UG ON TRIMMED BAM Test
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --run_trim_bam --genotyping_source 'trimmed' --gatk_ug_jar '/home/runner/work/eager/eager/jars/gatk_3_5/GenomeAnalysisTK.jar' --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP'
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv,docker --run_genotyping --run_trim_bam --genotyping_source 'trimmed' --genotyping_tool 'ug' --gatk_out_mode 'EMIT_ALL_SITES' --gatk_ug_genotype_model 'SNP'
- name: BAM_INPUT Run the basic pipeline with the bam input profile, skip AdapterRemoval as no convertBam
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_tsv_bam,docker --skip_adapterremoval
Expand Down
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [2.3.0dev] - Wangen - Unreleased
## [2.2.2dev] - Unreleased

### `Added`

Expand All @@ -13,8 +13,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

- Fixed AWS full test profile.
- [#587](https://github.com/nf-core/eager/issues/587) - Re-implemented AdapterRemovalFixPrefix for DeDup compatibility of including singletons
- [#602](https://github.com/nf-core/eager/issues/602) - Added the newly avaliable GATK 3.5 conda package.
- [#610](https://github.com/nf-core/eager/issues/610) - Create bwa_index channel when specifying circularmapper as mapper

### `Deprecated`

- Flag `--gatk_ug_jar` has now been removed as GATK 3.5 is now avaliable within the nf-core/eager software environment.

## [2.2.1] - 2020-10-20

### `Fixed`
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ COPY environment.yml /
RUN conda env create --quiet -f /environment.yml && conda clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-eager-2.3.0dev/bin:$PATH
ENV PATH /opt/conda/envs/nf-core-eager-2.2.2dev/bin:$PATH

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-eager-2.3.0dev > nf-core-eager-2.3.0dev.yml
RUN conda env export --name nf-core-eager-2.2.2dev > nf-core-eager-2.2.2dev.yml

# Instruct R processes to use these empty files instead of clashing with a local version
RUN touch .Rprofile
Expand Down
28 changes: 4 additions & 24 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -1556,12 +1556,8 @@ UnifiedGenotyper or GATK Haplotype Caller (v4); and the FreeBayes Caller.
Specify 'ug', 'hc', 'freebayes', 'pileupcaller' and 'angsd' respectively.

> Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA
> (HaplotypeCaller does _de novo_ assembly around each variant site), it is
> officially deprecated by the Broad Institute and is only accessible by an
> archived version not properly available on `conda`. Therefore if specifying
> 'ug', will need to supply a GATK 3.5 `-jar` to the parameter `gatk_ug_jar`.
> Note that this means the pipeline is not fully reproducible in this
> configuration, unless you personally supply the `.jar` file.
> (HaplotypeCaller does _de novo_ assembly around each variant site), be aware
> GATK 3.5 it is officially deprecated by the Broad Institute.

#### `--genotyping_source`

Expand All @@ -1570,17 +1566,6 @@ modules you have turned on. Options are: `'raw'` for mapped only, filtered, or
DeDup BAMs (with priority right to left); `'trimmed'` (for base clipped BAMs);
`'pmd'` (for pmdtools output). Default is: `'raw'`.

#### `--gatk_ug_jar`

Specify a path to a local copy of a GATK 3.5 `.jar` file, preferably version
'3.5-0-g36282e4'. The download location of this may be available from the GATK
forums or the [Google Cloud
Storage](https://console.cloud.google.com/storage/browser/gatk-software/package-archive/gatk?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false)
of the Broad Institute.

> You must manually report your version of GATK 3.5 in publications/MultiQC as
> it is not included in our container.

#### `--gatk_call_conf`

If selected, specify a GATK genotyper phred-scaled confidence threshold of a
Expand Down Expand Up @@ -4139,10 +4124,7 @@ Prior setting up the nf-core/eager run, we will need:
3. A GFF file of gene sequence annotations (normally supplied with reference
genomes downloaded from NCBI Genomes, in this context from
[here](https://www.ncbi.nlm.nih.gov/genome/?term=Yersinia+pestis))
4. The JAR file for GATK v3.5 downloadable from
[here](https://console.cloud.google.com/storage/browser/gatk-software/package-archive/gatk?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false)
(Make sure to extract the Zip file first!)
5. [Optional] Previously made VCF GATK 3.5 files (see below for settings), of
4. [Optional] Previously made VCF GATK 3.5 files (see below for settings), of
previously published _Y. pestis_ genomes.

We should also ensure we have the very latest version of the nf-core/eager
Expand Down Expand Up @@ -4474,7 +4456,7 @@ environmental relatives or other contaminants.

For this we need to run genotyping, but specifically with GATK UnifiedGenotyper
3.5 (as MultiVCFAnalyzer requires this particular format of VCF files). We will
therefore turn on Genotyping, supply the path to the GATK 3.5 JAR file, and
therefore turn on Genotyping, and
check ploidy is set 2 so 'heterozygous' positions can be reported. We will also
need to specify that we want to use the trimmed bams from the previous step.

Expand Down Expand Up @@ -4507,7 +4489,6 @@ nextflow run nf-core/eager \
--run_genotyping \
--genotyping_tool 'ug' \
--genotyping_source 'trimmed' \
--gatk_ug_jar '../bin/GenomeAnalysisTK.jar' \
--gatk_ploidy 2 \
--gatk_ug_mode 'EMIT_ALL_SITES' \
--gatk_ug_genotype_model 'SNP' \
Expand Down Expand Up @@ -4551,7 +4532,6 @@ nextflow run nf-core/eager \
--run_genotyping \
--genotyping_tool 'ug' \
--genotyping_source 'trimmed' \
--gatk_ug_jar '../bin/GenomeAnalysisTK.jar' \
--gatk_ploidy 2 \
--gatk_ug_mode 'EMIT_ALL_SITES' \
--gatk_ug_genotype_model 'SNP' \
Expand Down
3 changes: 2 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: nf-core-eager-2.3.0dev
name: nf-core-eager-2.2.2dev
channels:
- conda-forge
- bioconda
Expand All @@ -20,6 +20,7 @@ dependencies:
- bioconda::angsd=0.933
- bioconda::circularmapper=1.93.5
- bioconda::gatk4=4.1.7.0
- bioconda::gatk=3.5
- bioconda::qualimap=2.2.2d
- bioconda::vcf2genome=0.91
- bioconda::damageprofiler=0.4.9
Expand Down
35 changes: 7 additions & 28 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -136,9 +136,8 @@ def helpMessage() {

Genotyping
--run_genotyping [bool] Turn on genotyping of BAM files.
--genotyping_tool [str] Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Note: UnifiedGenotyper requires user-supplied defined GATK 3.5 jar file. Options: 'ug', 'hc', 'freebayes', 'pileupcaller', 'angsd'.
--genotyping_tool [str] Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Options: 'ug', 'hc', 'freebayes', 'pileupcaller', 'angsd'.
--genotyping_source [str] Specify which input BAM to use for genotyping. Options: 'raw', 'trimmed' or 'pmd'. Default: '${params.genotyping_source}'
--gatk_ug_jar [file] When specifying to use GATK UnifiedGenotyper, path to GATK 3.5 .jar.
--gatk_call_conf [num] Specify GATK phred-scaled confidence threshold. Default: ${params.gatk_call_conf}
--gatk_ploidy [num] Specify GATK organism ploidy. Default: ${params.gatk_ploidy}
--gatk_downsample [num] Maximum depth coverage allowed for genotyping before down-sampling is turned on. Default: ${params.gatk_downsample}
Expand Down Expand Up @@ -416,14 +415,6 @@ if (params.run_genotyping){
if (params.genotyping_tool != 'ug' && params.genotyping_tool != 'hc' && params.genotyping_tool != 'freebayes' && params.genotyping_tool != 'pileupcaller' && params.genotyping_tool != 'angsd' ) {
exit 1, "[nf-core/eager] error: please specify a genotyper. Options: 'ug', 'hc', 'freebayes', 'pileupcaller'. Found parameter: --genotyping_tool '${params.genotyping_tool}'."
}

if (params.genotyping_tool == 'ug' && params.gatk_ug_jar == '') {
exit 1, "[nf-core/eager] error: please specify path to a GATK 3.5 .jar file with --gatk_ug_jar."
}

if (params.genotyping_tool == 'ug' && !params.gatk_ug_jar.endsWith('.jar') ) {
exit 1, "[nf-core/eager] error: please specify path with --gatk_ug_jar to a valid GATK 3.5 binary that ends with .jar!. Found parameter: --gatk_ug_jar '${params.gatk_ug_jar}'."
}

if (params.gatk_ug_out_mode != 'EMIT_VARIANTS_ONLY' && params.gatk_ug_out_mode != 'EMIT_ALL_CONFIDENT_SITES' && params.gatk_ug_out_mode != 'EMIT_ALL_SITES') {
exit 1, "[nf-core/eager] error: please check your GATK output mode. Options are: 'EMIT_VARIANTS_ONLY', 'EMIT_ALL_CONFIDENT_SITES', 'EMIT_ALL_SITES'. Found parameter: --gatk_ug_out_mode '${params.gatk_out_mode}'."
Expand Down Expand Up @@ -470,17 +461,6 @@ if (params.run_genotyping){
}
}

// check manually supplied UG JAR found
if ( params.gatk_ug_jar != '' ) {
Channel
.fromPath( params.gatk_ug_jar, checkIfExists: true )
.set{ ch_unifiedgenotyper_jar }
} else {
Channel
.empty()
.set{ ch_unifiedgenotyper_jar }
}

// pileupCaller channel generation and input checks for 'random sampling' genotyping
if (params.pileupcaller_bedfile.isEmpty()) {
ch_bed_for_pileupcaller = Channel.fromPath("$baseDir/assets/nf-core_eager_dummy.txt")
Expand Down Expand Up @@ -2516,7 +2496,6 @@ process genotyping_ug {
input:
tuple samplename, libraryid, lane, seqtype, organism, strandedness, udg, file(bam), file(bai) from ch_damagemanipulation_for_genotyping_ug
file fasta from ch_fasta_for_genotyping_ug.collect()
file jar from ch_unifiedgenotyper_jar.collect()
file fai from ch_fai_for_ug.collect()
file dict from ch_dict_for_ug.collect()

Expand All @@ -2530,9 +2509,9 @@ process genotyping_ug {
if (params.gatk_dbsnp == '')
"""
samtools index -b ${bam}
java -Xmx${task.memory.toGiga()}g -jar ${jar} -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities}
java -Xmx${task.memory.toGiga()}g -jar ${jar} -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplename}.intervals -o ${samplename}.realign.bam ${defaultbasequalities}
java -Xmx${task.memory.toGiga()}g -jar ${jar} -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities}
gatk3 -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities}
gatk3 -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplename}.intervals -o ${samplename}.realign.bam ${defaultbasequalities}
gatk3 -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities}

$keep_realign

Expand All @@ -2541,9 +2520,9 @@ process genotyping_ug {
else if (params.gatk_dbsnp != '')
"""
samtools index ${bam}
java -jar ${jar} -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities}
java -jar ${jar} -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplenane}.intervals -o ${samplename}.realign.bam ${defaultbasequalities}
java -jar ${jar} -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --dbsnp ${params.gatk_dbsnp} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities}
gatk3 -T RealignerTargetCreator -R ${fasta} -I ${bam} -nt ${task.cpus} -o ${samplename}.intervals ${defaultbasequalities}
gatk3 -T IndelRealigner -R ${fasta} -I ${bam} -targetIntervals ${samplenane}.intervals -o ${samplename}.realign.bam ${defaultbasequalities}
gatk3 -T UnifiedGenotyper -R ${fasta} -I ${samplename}.realign.bam -o ${samplename}.unifiedgenotyper.vcf -nt ${task.cpus} --dbsnp ${params.gatk_dbsnp} --genotype_likelihoods_model ${params.gatk_ug_genotype_model} -stand_call_conf ${params.gatk_call_conf} --sample_ploidy ${params.gatk_ploidy} -dcov ${params.gatk_downsample} --output_mode ${params.gatk_ug_out_mode} ${defaultbasequalities}

$keep_realign

Expand Down
3 changes: 1 addition & 2 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,6 @@ params {
genotyping_tool = ''
genotyping_source = 'raw'
// gatk options
gatk_ug_jar = ''
gatk_call_conf = 30
gatk_ploidy = 2
gatk_downsample = 250
Expand Down Expand Up @@ -338,7 +337,7 @@ manifest {
description = 'A fully reproducible and state-of-the-art ancient DNA analysis pipeline'
mainScript = 'main.nf'
nextflowVersion = '!>=20.04.0'
version = '2.3.0dev'
version = '2.2.2dev'
}

// Function to ensure that resource requirements don't go beyond
Expand Down
10 changes: 2 additions & 8 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -877,9 +877,9 @@
},
"genotyping_tool": {
"type": "string",
"description": "Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Note: UnifiedGenotyper requires user-supplied defined GATK 3.5 jar file. Options: 'ug', 'hc', 'freebayes', 'pileupcaller', 'angsd'.",
"description": "Specify which genotyper to use either GATK UnifiedGenotyper, GATK HaplotypeCaller, Freebayes, or pileupCaller. Options: 'ug', 'hc', 'freebayes', 'pileupcaller', 'angsd'.",
"fa_icon": "fas fa-tools",
"help_text": "Specifies which genotyper to use. Current options are: GATK (v3.5) UnifiedGenotyper or GATK Haplotype Caller (v4); and the FreeBayes Caller. Specify 'ug', 'hc', 'freebayes', 'pileupcaller' and 'angsd' respectively.\n\n> Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA (HaplotypeCaller does _de novo_ assembly around each variant site), it is officially deprecated by the Broad Institute and is only accessible by an archived version not properly available on `conda`. Therefore if specifying 'ug', will need to supply a GATK 3.5 `-jar` to the parameter `gatk_ug_jar`. Note that this means the pipline is not fully reproducible in this configuration, unless you personally supply the `.jar` file.",
"help_text": "Specifies which genotyper to use. Current options are: GATK (v3.5) UnifiedGenotyper or GATK Haplotype Caller (v4); and the FreeBayes Caller. Specify 'ug', 'hc', 'freebayes', 'pileupcaller' and 'angsd' respectively.\n\n> > Note that while UnifiedGenotyper is more suitable for low-coverage ancient DNA (HaplotypeCaller does _de novo_ assembly around each variant site), be aware GATK 3.5 it is officially deprecated by the Broad Institute.",
"enum": [
"ug",
"hc",
Expand All @@ -895,12 +895,6 @@
"fa_icon": "fas fa-faucet",
"help_text": "Indicates which BAM file to use for genotyping, depending on what BAM processing modules you have turned on. Options are: `'raw'` for mapped only, filtered, or DeDup BAMs (with priority right to left); `'trimmed'` (for base clipped BAMs); `'pmd'` (for pmdtools output). Default is: `'raw'`.\n"
},
"gatk_ug_jar": {
"type": "string",
"description": "When specifying to use GATK UnifiedGenotyper, path to GATK 3.5 .jar.",
"fa_icon": "fas fa-archive",
"help_text": "Specify a path to a local copy of a GATK 3.5 `.jar` file, preferably version\n'3.5-0-g36282e4'. The download location of this may be available from the GATK\nforums or the [Google Cloud\nStorage](https://console.cloud.google.com/storage/browser/gatk-software/package-archive/gatk?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false)\nof the Broad Institute."
},
"gatk_call_conf": {
"type": "integer",
"default": 30,
Expand Down