You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Áhkká is one of the massifs just outside of the Sarek National Park.
11
-
12
-
### Fixed
13
-
14
-
-[#566](https://github.com/nf-core/sarek/pull/566) - Fix caching bug affecting a variable number of `MapReads` jobs due to non-deterministic state of `statusMap` during caching evaluation
-[#668](https://github.com/nf-core/sarek/pull/668) - Add `snpeff_version` and `vep_version` to `schema_ignore_params` to avoid issue when specifying on command line
167
161
-[#669](https://github.com/nf-core/sarek/pull/669) - Fix path to files when creating csv files
168
162
169
163
### Deprecated
@@ -182,6 +176,14 @@ Pårtejekna is one of glaciers of the Pårte Massif.
182
176
-[#605](https://github.com/nf-core/sarek/pull/605) - Removed Scatter/gather from GATK_SINGLE_SAMPLE_GERMLINE_VARIANT_CALLING, all intervals are processed together
Áhkká is one of the massifs just outside of the Sarek National Park.
182
+
183
+
### Fixed
184
+
185
+
-[#566](https://github.com/nf-core/sarek/pull/566) - Fix caching bug affecting a variable number of `MapReads` jobs due to non-deterministic state of `statusMap` during caching evaluation
Copy file name to clipboardExpand all lines: docs/usage.md
+70-73Lines changed: 70 additions & 73 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -482,8 +482,8 @@ If you have any questions or issues please send us a message on [Slack](https://
482
482
483
483
When using default parameters only, sarek runs preprocessing and exits after base quality score recalibration. This is reflected in the default test profile:
484
484
485
-
```
486
-
nextflow run nf-core/sarek -r 3.0.0 -profile test,<container/institute>
485
+
```console
486
+
nextflow run nf-core/sarek -r 3.0 -profile test,<container/institute>
487
487
```
488
488
489
489
Expected run output:
@@ -532,13 +532,13 @@ Expected run output:
532
532
533
533
The pipeline comes with a number of possible paths and tools that can be used. The easiest and fastest test to see that the preprocessing + variantcalling (in this case Strelka2) works, is to run:
534
534
535
-
```
535
+
```console
536
536
nextflow run nf-core/sarek -r 3.0.0 -profile test,<container/institute> --tools strelka
537
537
```
538
538
539
539
Due to the small test data size, unfortunately not everything can be tested from top-to-bottom, but often is done by utilizing the pipeline's `--step` parameter. Annotation has to tested separatly from the remaining workflow, since we use references for `C.elegans`, while the remaining tests are run on downsampled human data.
@@ -611,7 +611,7 @@ In addition, currently the mismatch penalty for reads with tumor status in the s
611
611
When plots are missing, it is possible that the fasta and the custom SnpEff database are not matching https://pcingola.github.io/SnpEff/se_faq/#error_chromosome_not_found-details.
612
612
The SnpEff completes without throwing an error causing nextflow to complete successfully. An indication for the error are these lines in the `.command` files:
613
613
614
-
```
614
+
```text
615
615
ERRORS: Some errors were detected
616
616
Error type Number of errors
617
617
ERROR_CHROMOSOME_NOT_FOUND 17522411
@@ -627,7 +627,7 @@ If you have problems running processes that make use of Spark such as `MarkDupli
627
627
You are probably experiencing issues with the limit of open files in your system.
628
628
You can check your current limit by typing the following:
629
629
630
-
```bash
630
+
```console
631
631
ulimit -n
632
632
```
633
633
@@ -636,20 +636,20 @@ In order to increase the size limit permanently you can:
636
636
637
637
Edit the file `/etc/security/limits.conf` and add the lines:
638
638
639
-
```bash
639
+
```console
640
640
* soft nofile 65535
641
641
* hard nofile 65535
642
642
```
643
643
644
644
Edit the file `/etc/sysctl.conf` and add the line:
645
645
646
-
```bash
646
+
```console
647
647
fs.file-max = 65535
648
648
```
649
649
650
650
Edit the file `/etc/sysconfig/docker` and add the new limits to OPTIONS like this:
651
651
652
-
```bash
652
+
```console
653
653
OPTIONS=”—default-ulimit nofile=65535:65535"
654
654
```
655
655
@@ -681,6 +681,36 @@ Recent updates to Samtools have been introduced, which can speed-up performance
681
681
The current workflow does not handle duplex UMIs (i.e. where opposite strands of a duplex molecule have been tagged with a different UMI), and best practices have been proposed to process this type of data.
682
682
Both changes will be implemented in a future release.
683
683
684
+
## How to run sarek when no(t all) reference files are in igenomes
685
+
686
+
For common genomes, such as GRCh38 and GRCh37, the pipeline is shipped with (almost) all necessary reference files. However, sometimes it is necessary to use custom references for some or all files:
687
+
688
+
### No igenomes reference files are used
689
+
690
+
If none of your required genome files are in igenomes, `--igenomes_ignore` must be set to ignore any igenomes input and `--genome null`. The `fasta` file is the only required input file and must be provided to run the pipeline. All other possible reference file can be provided in addition. For details, see the paramter documentation.
691
+
692
+
Minimal example for custom genomes:
693
+
694
+
```console
695
+
nextflow run nf-core/sarek --genome null --igenomes_ignore --fasta <custom.fasta>
696
+
```
697
+
698
+
### Overwrite specific reference files
699
+
700
+
If you don't want to use some of the provided reference genomes, they can be overwritten by either providing a new file or setting the respective file parameter to `false`, if it should be ignored:
701
+
702
+
Example for using a custom known indels file:
703
+
704
+
```console
705
+
nextflow run nf-core/sarek --known_indels <my_known_indels.vcf.gz> --genome GRCh38.GATK
706
+
```
707
+
708
+
Example for not using known indels, but all other provided reference file:
709
+
710
+
```console
711
+
nextflow run nf-core/sarek --known_indels false --genome GRCh38.GATK
712
+
```
713
+
684
714
### Where do the used reference genomes originate from
Sarek comes shipped with containers for both snpEff and VEP for human reference genomes with `--genome GATK.GRCh38` and `--genome GATK.GRCh37`. Different containers however can be provided.
753
-
754
-
<!-- #### Create containers
755
-
756
-
The cache has to be downloaded.
757
-
758
-
`sareksnpeff`, our `snpeff` container is designed using [Conda](https://conda.io/).
A `Nextflow` helper script has been designed to help downloading `snpEff` and `VEP` caches.
801
-
Such files are meant to be shared between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.
802
-
803
-
```bash
804
-
nextflow run download_cache.nf --snpeff_cache </path/to/snpEff/cache> --snpeff_db <snpEff DB version> --genome <GENOME>
Similarly, when wanting to use a different cache than the one specified in the iGenomes config file, one can use `--snpeff_db`, `--snpeff_genome`, `--snpeff_version`, `--vep_cache_version`, `--vep_genome`, `--vep_species` and `--vep_version` to overwrite these default value related to the databases, genomes, versions and caches' versions used by these tools.
807
796
808
797
### Using VEP plugins
809
798
810
-
<!-- To enable the use of the `VEP``CADD` plugin:
799
+
#### dbnsfp
811
800
812
-
- Download the `CADD` files
813
-
- Specify them (either on the command line, like in the example or in a configuration file)
814
-
- use the `--cadd_cache` flag
801
+
Enable with `--vep_dbnsfp`. The following parameters are mandatory:
815
802
816
-
Example:
803
+
-`--dbnsfp`, to specify the path to the dbNSFP processed file.
804
+
-`--dbnsfp_tbi`, to specify the path to the dbNSFP tabix indexed file.
-`--dbnsfp_consequence`, to filter/limit outputs to a specific effect of the variant.
809
+
- The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found [here](https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html).
810
+
- If one wants to filter using several consequences, then separate those by using '&' (i.e. `--dbnsfp_consequence '3_prime_UTR_variant&intron_variant'`.",
811
+
-`--dbnsfp_fields`, to retrieve individual values from the dbNSFP file.
812
+
- The values correspond to the name of the columns in the dbNSFP file and are separated by comma.
813
+
- The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.
827
814
828
-
An helper script has been designed to help downloading `CADD` files.
829
-
Such files are meant to be share between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.
815
+
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp).
830
816
831
-
```bash
832
-
nextflow run download_cache.nf --cadd_cache </path/to/CADD/cache> --cadd_version <CADD version> --genome <GENOME>
833
-
``` -->
817
+
#### LOFTEE
834
818
835
-
#### dbnsfp
819
+
Enable with `--vep_loftee`.
836
820
837
-
#### LOFTEE
821
+
For more details, see [here](https://github.com/konradjk/loftee).
838
822
839
823
#### SpliceAi
840
824
825
+
Enable with `--vep_spliceai`. The following parameters are mandatory:
826
+
827
+
-`--spliceai_snv`, to specify the path to SpliceAI raw scores snv file.
828
+
-`--spliceai_snv_tbi`, to specify the path to SpliceAI raw scores snv tabix indexed file.
829
+
-`--spliceai_indel`, to specify the path to SpliceAI raw scores indel file.
830
+
-`--spliceai_indel_tbi`, to specify the path to SpliceAI raw scores indel tabix indexed file.
831
+
832
+
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#spliceai).
833
+
841
834
#### SpliceRegions
842
835
836
+
Enable with `--vep_spliceregion`.
837
+
838
+
For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#spliceregion) and [here](https://www.ensembl.info/2018/10/26/cool-stuff-the-vep-can-do-splice-site-variant-annotation/)."
839
+
843
840
## Requested resources for the tools
844
841
845
842
Resource requests are difficult to generalize and are often dependent on input data size. Currently, the number of cpus and memory requested by default were adapted from tests on 5 ICGC paired whole-genome sequencing samples with approximately 40X and 80X depth.
0 commit comments