Skip to content

Commit 6a893cc

Browse files
merge upstream/dev
2 parents 10cd8a2 + ce77595 commit 6a893cc

6 files changed

Lines changed: 88 additions & 91 deletions

File tree

CHANGELOG.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,9 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [2.7.2](https://github.com/nf-core/sarek/releases/tag/2.7.2) - Áhkká
9-
10-
Áhkká is one of the massifs just outside of the Sarek National Park.
11-
12-
### Fixed
13-
14-
- [#566](https://github.com/nf-core/sarek/pull/566) - Fix caching bug affecting a variable number of `MapReads` jobs due to non-deterministic state of `statusMap` during caching evaluation
15-
16-
## [2.7.1](https://github.com/nf-core/sarek/releases/tag/2.7.1) - Pårtejekna
8+
## [3.0](https://github.com/nf-core/sarek/releases/tag/3.0) - Skierfe
179

18-
Pårtejekna is one of glaciers of the Pårte Massif.
10+
Skierfe is a mountain in the Sarek national park, and the inspiration for the logo.
1911

2012
### Added
2113

@@ -112,6 +104,7 @@ Pårtejekna is one of glaciers of the Pårte Massif.
112104
- [#659](https://github.com/nf-core/sarek/pull/659) - Update usage.md docu section on `How to run ASCAT with WES`
113105
- [#661](https://github.com/nf-core/sarek/pull/661) - Add cnvkit reference creation to index subway map
114106
- [#663](https://github.com/nf-core/sarek/pull/663) - Add separate parameters for `ASCAT` and `ControlFREEC` back in
107+
- [#668](https://github.com/nf-core/sarek/pull/668) - Update annotation documentation
115108

116109
### Fixed
117110

@@ -164,6 +157,7 @@ Pårtejekna is one of glaciers of the Pårte Massif.
164157
- [#655](https://github.com/nf-core/sarek/pull/655) - Fix `--intervals false` logic & add versioning for local modules
165158
- [#658](https://github.com/nf-core/sarek/pull/658) - Fix split fastq names in multiqc-report
166159
- [#666](https://github.com/nf-core/sarek/pull/666) - Simplify multiqc config channel input
160+
- [#668](https://github.com/nf-core/sarek/pull/668) - Add `snpeff_version` and `vep_version` to `schema_ignore_params` to avoid issue when specifying on command line
167161
- [#669](https://github.com/nf-core/sarek/pull/669) - Fix path to files when creating csv files
168162

169163
### Deprecated
@@ -182,6 +176,14 @@ Pårtejekna is one of glaciers of the Pårte Massif.
182176
- [#605](https://github.com/nf-core/sarek/pull/605) - Removed Scatter/gather from GATK_SINGLE_SAMPLE_GERMLINE_VARIANT_CALLING, all intervals are processed together
183177
- [#643](https://github.com/nf-core/sarek/pull/643) - Removed Sentieon parameters
184178

179+
## [2.7.2](https://github.com/nf-core/sarek/releases/tag/2.7.2) - Áhkká
180+
181+
Áhkká is one of the massifs just outside of the Sarek National Park.
182+
183+
### Fixed
184+
185+
- [#566](https://github.com/nf-core/sarek/pull/566) - Fix caching bug affecting a variable number of `MapReads` jobs due to non-deterministic state of `statusMap` during caching evaluation
186+
185187
## [2.7.1](https://github.com/nf-core/sarek/releases/tag/2.7.1) - Pårtejekna
186188

187189
Pårtejekna is one of glaciers of the Pårte Massif.

conf/test.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ params {
4444
vep_version = '106.1'
4545

4646
// Ignore params that will throw warning through params validation
47-
schema_ignore_params = "genomes,test_data"
47+
schema_ignore_params = 'genomes,test_data,snpeff_version,vep_version'
4848
}
4949

5050
profiles {

conf/test_full.config

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ params {
1919

2020
// Other params
2121
tools = 'strelka,freebayes,haplotypecaller,deepvariant,manta,tiddit,cnvkit,vep'
22-
schema_ignore_params = 'genomes'
2322

2423
split_fastq = 50000000
2524
}

conf/test_full_somatic.config

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,7 @@ params {
1919

2020
// Other params
2121
tools = 'strelka,mutect2,freebayes,ascat,manta,cnvkit,tiddit,controlfreec,vep'
22-
schema_ignore_params = 'genomes'
23-
wes = true
24-
intervals = 's3://nf-core-awsmegatests/sarek/input/S07604624_Padded_Agilent_SureSelectXT_allexons_V6_UTR.bed'
2522
split_fastq = 20000000
23+
intervals = 's3://nf-core-awsmegatests/sarek/input/S07604624_Padded_Agilent_SureSelectXT_allexons_V6_UTR.bed'
24+
wes = true
2625
}

docs/usage.md

Lines changed: 70 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -482,8 +482,8 @@ If you have any questions or issues please send us a message on [Slack](https://
482482
483483
When using default parameters only, sarek runs preprocessing and exits after base quality score recalibration. This is reflected in the default test profile:
484484
485-
```
486-
nextflow run nf-core/sarek -r 3.0.0 -profile test,<container/institute>
485+
```console
486+
nextflow run nf-core/sarek -r 3.0 -profile test,<container/institute>
487487
```
488488

489489
Expected run output:
@@ -532,13 +532,13 @@ Expected run output:
532532

533533
The pipeline comes with a number of possible paths and tools that can be used. The easiest and fastest test to see that the preprocessing + variantcalling (in this case Strelka2) works, is to run:
534534

535-
```
535+
```console
536536
nextflow run nf-core/sarek -r 3.0.0 -profile test,<container/institute> --tools strelka
537537
```
538538

539539
Due to the small test data size, unfortunately not everything can be tested from top-to-bottom, but often is done by utilizing the pipeline's `--step` parameter. Annotation has to tested separatly from the remaining workflow, since we use references for `C.elegans`, while the remaining tests are run on downsampled human data.
540540

541-
```
541+
```console
542542
nextflow run nf-core/sarek -r 3.0.0 -profile test,<container/institute> --tools snpeff --step annotation
543543
```
544544

@@ -611,7 +611,7 @@ In addition, currently the mismatch penalty for reads with tumor status in the s
611611
When plots are missing, it is possible that the fasta and the custom SnpEff database are not matching https://pcingola.github.io/SnpEff/se_faq/#error_chromosome_not_found-details.
612612
The SnpEff completes without throwing an error causing nextflow to complete successfully. An indication for the error are these lines in the `.command` files:
613613

614-
```
614+
```text
615615
ERRORS: Some errors were detected
616616
Error type Number of errors
617617
ERROR_CHROMOSOME_NOT_FOUND 17522411
@@ -627,7 +627,7 @@ If you have problems running processes that make use of Spark such as `MarkDupli
627627
You are probably experiencing issues with the limit of open files in your system.
628628
You can check your current limit by typing the following:
629629

630-
```bash
630+
```console
631631
ulimit -n
632632
```
633633

@@ -636,20 +636,20 @@ In order to increase the size limit permanently you can:
636636

637637
Edit the file `/etc/security/limits.conf` and add the lines:
638638

639-
```bash
639+
```console
640640
* soft nofile 65535
641641
* hard nofile 65535
642642
```
643643

644644
Edit the file `/etc/sysctl.conf` and add the line:
645645

646-
```bash
646+
```console
647647
fs.file-max = 65535
648648
```
649649

650650
Edit the file `/etc/sysconfig/docker` and add the new limits to OPTIONS like this:
651651

652-
```bash
652+
```console
653653
OPTIONS=”—default-ulimit nofile=65535:65535"
654654
```
655655

@@ -681,6 +681,36 @@ Recent updates to Samtools have been introduced, which can speed-up performance
681681
The current workflow does not handle duplex UMIs (i.e. where opposite strands of a duplex molecule have been tagged with a different UMI), and best practices have been proposed to process this type of data.
682682
Both changes will be implemented in a future release.
683683

684+
## How to run sarek when no(t all) reference files are in igenomes
685+
686+
For common genomes, such as GRCh38 and GRCh37, the pipeline is shipped with (almost) all necessary reference files. However, sometimes it is necessary to use custom references for some or all files:
687+
688+
### No igenomes reference files are used
689+
690+
If none of your required genome files are in igenomes, `--igenomes_ignore` must be set to ignore any igenomes input and `--genome null`. The `fasta` file is the only required input file and must be provided to run the pipeline. All other possible reference file can be provided in addition. For details, see the paramter documentation.
691+
692+
Minimal example for custom genomes:
693+
694+
```console
695+
nextflow run nf-core/sarek --genome null --igenomes_ignore --fasta <custom.fasta>
696+
```
697+
698+
### Overwrite specific reference files
699+
700+
If you don't want to use some of the provided reference genomes, they can be overwritten by either providing a new file or setting the respective file parameter to `false`, if it should be ignored:
701+
702+
Example for using a custom known indels file:
703+
704+
```console
705+
nextflow run nf-core/sarek --known_indels <my_known_indels.vcf.gz> --genome GRCh38.GATK
706+
```
707+
708+
Example for not using known indels, but all other provided reference file:
709+
710+
```console
711+
nextflow run nf-core/sarek --known_indels false --genome GRCh38.GATK
712+
```
713+
684714
### Where do the used reference genomes originate from
685715

686716
_under construction - help needed_
@@ -747,40 +777,7 @@ nextflow run nf-core/sarek --known_indels false --genome GRCh38.GATK
747777

748778
## How to customise SnpEff and VEP annotation
749779

750-
_under construction help needed_
751-
752-
Sarek comes shipped with containers for both snpEff and VEP for human reference genomes with `--genome GATK.GRCh38` and `--genome GATK.GRCh37`. Different containers however can be provided.
753-
754-
<!-- #### Create containers
755-
756-
The cache has to be downloaded.
757-
758-
`sareksnpeff`, our `snpeff` container is designed using [Conda](https://conda.io/).
759-
760-
[![sareksnpeff-docker status](https://img.shields.io/docker/automated/nfcore/sareksnpeff.svg)](https://hub.docker.com/r/nfcore/sareksnpeff)
761-
762-
Based on [nfcore/base:1.12.1](https://hub.docker.com/r/nfcore/base/tags), it contains:
763-
764-
- **[snpEff](http://snpeff.sourceforge.net/)** 4.3.1t
765-
- Cache for `GRCh37`, `GRCh38`, `GRCm38`, `CanFam3.1` or `WBcel235`
766-
767-
`sarekvep`, our `vep` container is designed using [Conda](https://conda.io/).
768-
769-
[![sarekvep-docker status](https://img.shields.io/docker/automated/nfcore/sarekvep.svg)](https://hub.docker.com/r/nfcore/sarekvep)
770-
771-
Based on [nfcore/base:1.12.1](https://hub.docker.com/r/nfcore/base/tags), it contains:
772-
773-
- **[GeneSplicer](https://ccb.jhu.edu/software/genesplicer/)** 1.0
774-
- **[VEP](https://github.com/Ensembl/ensembl-vep)** 99.2
775-
- Cache for `GRCh37`, `GRCh38`, `GRCm38`, `CanFam3.1` or `WBcel235` -->
776-
777-
<!-- "snpeff_db"
778-
"snpeff_genome":
779-
"snpeff_version":
780-
"vep_genome":
781-
"vep_species":
782-
"vep_cache_version":
783-
"vep_version": -->
780+
Sarek uses nf-core provided containers for both snpEff and VEP for several reference genomes ('CanFam3', 'GRCh37', 'GRCh38', 'GRCm38' and 'WBcel235').
784781

785782
### Using downloaded cache
786783

@@ -790,56 +787,56 @@ You need to specify the cache directory using `--snpeff_cache` and `--vep_cache`
790787

791788
Example:
792789

793-
```bash
790+
```console
794791
nextflow run nf-core/sarek --tools snpEff --step annotate --sample <file.vcf.gz> --snpeff_cache </path/to/snpEff/cache>
795792
nextflow run nf-core/sarek --tools VEP --step annotate --sample <file.vcf.gz> --vep_cache </path/to/VEP/cache>
796793
```
797794

798-
### Download cache
799-
800-
A `Nextflow` helper script has been designed to help downloading `snpEff` and `VEP` caches.
801-
Such files are meant to be shared between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.
802-
803-
```bash
804-
nextflow run download_cache.nf --snpeff_cache </path/to/snpEff/cache> --snpeff_db <snpEff DB version> --genome <GENOME>
805-
nextflow run download_cache.nf --vep_cache </path/to/VEP/cache> --species <species> --vep_cache_version <VEP cache version> --genome <GENOME>
806-
```
795+
Similarly, when wanting to use a different cache than the one specified in the iGenomes config file, one can use `--snpeff_db`, `--snpeff_genome`, `--snpeff_version`, `--vep_cache_version`, `--vep_genome`, `--vep_species` and `--vep_version` to overwrite these default value related to the databases, genomes, versions and caches' versions used by these tools.
807796

808797
### Using VEP plugins
809798

810-
<!-- To enable the use of the `VEP` `CADD` plugin:
799+
#### dbnsfp
811800

812-
- Download the `CADD` files
813-
- Specify them (either on the command line, like in the example or in a configuration file)
814-
- use the `--cadd_cache` flag
801+
Enable with `--vep_dbnsfp`. The following parameters are mandatory:
815802

816-
Example:
803+
- `--dbnsfp`, to specify the path to the dbNSFP processed file.
804+
- `--dbnsfp_tbi`, to specify the path to the dbNSFP tabix indexed file.
817805

818-
```bash
819-
nextflow run nf-core/sarek --step annotate --tools VEP --sample <file.vcf.gz> --cadd_cache \
820-
--cadd_indels </path/to/CADD/cache/InDels.tsv.gz> \
821-
--cadd_indels_tbi </path/to/CADD/cache/InDels.tsv.gz.tbi> \
822-
--cadd_wg_snvs </path/to/CADD/cache/whole_genome_SNVs.tsv.gz> \
823-
--cadd_wg_snvs_tbi </path/to/CADD/cache/whole_genome_SNVs.tsv.gz.tbi>
824-
```
806+
The following parameters are optionnal:
825807

826-
#### Downloading CADD files
808+
- `--dbnsfp_consequence`, to filter/limit outputs to a specific effect of the variant.
809+
- The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found [here](https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html).
810+
- If one wants to filter using several consequences, then separate those by using '&' (i.e. `--dbnsfp_consequence '3_prime_UTR_variant&intron_variant'`.",
811+
- `--dbnsfp_fields`, to retrieve individual values from the dbNSFP file.
812+
- The values correspond to the name of the columns in the dbNSFP file and are separated by comma.
813+
- The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.
827814

828-
An helper script has been designed to help downloading `CADD` files.
829-
Such files are meant to be share between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.
815+
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp).
830816

831-
```bash
832-
nextflow run download_cache.nf --cadd_cache </path/to/CADD/cache> --cadd_version <CADD version> --genome <GENOME>
833-
``` -->
817+
#### LOFTEE
834818

835-
#### dbnsfp
819+
Enable with `--vep_loftee`.
836820

837-
#### LOFTEE
821+
For more details, see [here](https://github.com/konradjk/loftee).
838822

839823
#### SpliceAi
840824

825+
Enable with `--vep_spliceai`. The following parameters are mandatory:
826+
827+
- `--spliceai_snv`, to specify the path to SpliceAI raw scores snv file.
828+
- `--spliceai_snv_tbi`, to specify the path to SpliceAI raw scores snv tabix indexed file.
829+
- `--spliceai_indel`, to specify the path to SpliceAI raw scores indel file.
830+
- `--spliceai_indel_tbi`, to specify the path to SpliceAI raw scores indel tabix indexed file.
831+
832+
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#spliceai).
833+
841834
#### SpliceRegions
842835

836+
Enable with `--vep_spliceregion`.
837+
838+
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#spliceregion) and [here](https://www.ensembl.info/2018/10/26/cool-stuff-the-vep-can-do-splice-site-variant-annotation/)."
839+
843840
## Requested resources for the tools
844841

845842
Resource requests are difficult to generalize and are often dependent on input data size. Currently, the number of cpus and memory requested by default were adapted from tests on 5 ICGC paired whole-genome sequencing samples with approximately 40X and 80X depth.

nextflow.config

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ params {
9999
help = false
100100
validate_params = true
101101
show_hidden_params = false
102-
schema_ignore_params = 'genomes'
102+
schema_ignore_params = 'genomes,snpeff_version,vep_version'
103103
enable_conda = false
104104

105105
// Config options
@@ -182,8 +182,8 @@ profiles {
182182
podman.enabled = false
183183
shifter.enabled = false
184184
}
185-
test { includeConfig 'conf/test.config' }
186-
test_full { includeConfig 'conf/test_full.config' }
185+
test { includeConfig 'conf/test.config' }
186+
test_full { includeConfig 'conf/test_full.config' }
187187
test_full_somatic { includeConfig 'conf/test_full_somatic.config' }
188188
}
189189

0 commit comments

Comments
 (0)