Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .markdownlint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"MD013": false,
"MD024": { "siblings_only": true }
}
583 changes: 293 additions & 290 deletions CHANGELOG.md

Large diffs are not rendered by default.

20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
# [![Sarek](docs/images/Sarek_logo.png "Sarek")](https://sarek.scilifelab.se/)
[![nf-core](docs/images/nf-core_logo.png "Sarek")](https://nf-co.re/)
# [![Sarek](docs/images/nf-core_sarek_logo.png "Sarek")](https://sarek.scilifelab.se/)

**An open-source analysis pipeline to detect germline or somatic variants from whole genome or targeted sequencing**

> :warning: This pipeline is a work in progress being ported to nf-core from [SciLifeLab/Sarek](https://github/SciLifeLab/Sarek/)
> **An open-source analysis pipeline to detect germline or somatic variants from whole genome or targeted sequencing**

[![Nextflow version][nextflow-badge]](https://www.nextflow.io/)
[![nf-core][nf-core-badge]](https://nf-co.re/)
Expand All @@ -17,7 +14,10 @@

[![Join us on Slack][slack-badge]](https://nfcore.slack.com/messages/CGFUX04HZ/)

> :warning: This pipeline is a work in progress being ported to nf-core from [SciLifeLab/Sarek](https://github/SciLifeLab/Sarek/)

## Introduction

<img align="right" title="CAW" src="/docs/images/CAW_logo.png">

Previously known as the Cancer Analysis Workflow (CAW),
Expand All @@ -32,6 +32,7 @@ Thus making installation trivial and results highly reproducible.
It's listed on the [Elixir - Tools and Data Services Registry](https://bio.tools/Sarek), [Dockstore](https://dockstore.org/workflows/github.com/SciLifeLab/Sarek/) and [omicX - Bioinformatics tools](https://omictools.com/sarek-tool).

## Documentation

The nf-core/sarek pipeline comes with documentation about the pipeline, found in the `docs/` directory:

1. [Installation](https://nf-co.re/usage/installation)
Expand All @@ -57,10 +58,12 @@ The nf-core/sarek pipeline comes with documentation about the pipeline, found in
Sarek was developed at the [National Genomics Infastructure][ngi-link] and [National Bioinformatics Infastructure Sweden][nbis-link] which are both platforms at [SciLifeLab][scilifelab-link], with the support of [The Swedish Childhood Tumor Biobank (Barntumörbanken)][btb-link].

Main authors:

* [Maxime Garcia](https://github.com/MaxUlysse)
* [Szilveszter Juhos](https://github.com/szilvajuhos)

Helpful contributors:

* [Johannes Alneberg](https://github.com/alneberg)
* [Phil Ewels](https://github.com/ewels)
* [Jesper Eisfeldt](https://github.com/J35P312)
Expand All @@ -87,21 +90,20 @@ For further information or help, don't hesitate to get in touch on [Slack](https
* [CHANGELOG](CHANGELOG.md)

## Aknowledgements

[![Barntumörbanken](docs/images/BTB_logo.png)](https://ki.se/forskning/barntumorbanken-0) | [![SciLifeLab](docs/images/SciLifeLab_logo.png)](https://scilifelab.se)
:-:|:-:
[![National Genomics Infrastructure](docs/images/NGI_logo.png)](https://ngisweden.scilifelab.se/) | [![National Bioinformatics Infrastructure Sweden](docs/images/NBIS_logo.png)](https://nbis.se)


## Citation

If you use nf-core/sarek for your analysis, please cite the `Sarek` pre-print as follows:
Garcia MU, Juhos S, Larsson M, Olason PI, Martin M, Eisfeldt J, DiLorenzo S, Sandgren J, de Ståhl TD, Wirta V, Nistér M, Nystedt B, Käller M. **Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants**. *bioRxiv*. 2018. p. 316976. [doi: 10.1101/316976](https://www.biorxiv.org/content/10.1101/316976v1).
> Garcia MU, Juhos S, Larsson M, Olason PI, Martin M, Eisfeldt J, DiLorenzo S, Sandgren J, de Ståhl TD, Wirta V, Nistér M, Nystedt B, Käller M. **Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants**. *bioRxiv*. 2018. p. 316976. [doi: 10.1101/316976](https://www.biorxiv.org/content/10.1101/316976v1).

You can cite the sarek zenodo record for a specific version using the following [doi: 10.5281/zenodo.2582812](https://doi.org/10.5281/zenodo.2582812)

You can cite the `nf-core` pre-print as follows:
Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v3).

> Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v3).

[bioconda-badge]: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACQAAAAkCAYAAADhAJiYAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4wUJDSc29Eu32QAACDBJREFUWMO1mGtwVOUZx//P875nz16zSxIgASKxI4URbWurDrYzWlr90GIRQZM14aK0oxVn1A86fvKD44eOnVHH6ThgR4cSLiaUFhzsdFov2NJ4+4C3ip2iYkwg2JIEsmd3z57zvu/TD8o9ASLp82nP7tk9v32e//vcCOdpql3B9tjj1+kV6YY4jn+mlb48tvFUxUqY+FBkon0ABgF8xsz9DemG4NBzh9z5PofOCtGmYLfak6F+oki1OnE8s2Hmtv7D/deIyJMCaT3zh8mC8BaB3tBab4+2RL0AkF2RRbAxmDiQLmqYbgO6hRoKucK8KI6iSlT5KxP/2zk3Xym1W0TW+56/u1qr7hFI01n+W1WxetX22BsBQLdrmB4z5o18+hv1d9QDAEy3gSqqZcR0eGT9SK8VuxBAQESjAEIIktbZZxbMWXBEsbqBiGpnAUpZZxdRG8WqXS06BjPrl7PODtR0ZxOG1w/jigevSFAbrbXWbtNK/4bbOFWLa/clVOKP1tprFavXBdJIIOzZv+fKbCo7BKB2Ln2IiLbOvki30mMAMLBuAE13NZ07ZN5t3pLYxDcqVlMJ1CWQvSLSC+BjYjoCh5lKqfeNNVctnLdw/q6PdvU6uO9hAsbM21yPu/UM3Z5ydTXgX+dnxcmAc+7vWumeS2df+kEYhSUAfcaapcxcIaIREcnnM/niJ//95CoRWUBEB4gpEpEYQHIsOZzmrUv5Mk7Jh/KyLmq4f7oTHvI7fNS21JBflZ9Rqpb+wMTTnTgNoCGXyhWDMPgHga6bVT/rjYHhgZsUq6RA9kfPRztPf1DLXS1TDh45uFRErhRIu4hMGf9EEZRSPzXd5s/ZlVkEXcGpIVPtaqcipSMbLSSiQ0QUisi7GT9zf6VWOZDQiXvCLeG6ZEdSh1tCg0VAa2srPnv6MwBAbkUOpY0lAEDhjoKqRtVZ1tqnjDU3jQtFFMpWSZ2A/AGAXiB/ez5VCSsHjDXkaW9XbOLriajf056xsb0qkUisrmyurMssz6C8qXxOjdStqsPohtEvNVn0no5tvGbcfMdqq+2x7cdDll6RbnXWDRlrPnRwZQIFIpIFkNKsD0Xd0QJcgGV+keFaqbbWOHPneHlKs/6h6TFv87TV0ziKooejONICecU5N8+Jm5nQifcF0uBp78Fj30oUExOG0UWN8rNlZ3rMXZ7y/jJunhK7pOXuFuKRykjGOnsZGM9m/Mwa3/M3Q9AV2ej1DfdvKJT3lncfS2BRdzRhINN9IiNbsWuIaLy68f2Dwwez7JybTURzxUm/0grN+ea7AQwyuHn1U6vvLVxW0APrBnCh5hU9pBKpzwFsHTsP4GoRmcHJRNICWDanec5zQTXo7h/unw/Cj40zDwEIj3QdMZgEi7tjlDeWDYHeAeDO5JEUM7dwzdY+Fcht+wb3vWusmZfQiTtvvvzmJUy8J5/K78QkW1Oh6U9ENDLWZ05cM3Ebd4CwAcCQiBgmTilWr13SfEnn3if2hvg/GLVRn4hcNMbxf4i10jc467Szbro4mWmdnSKQpQcOH1gOAINHBycN5HDl8Jd1jDgYR0fNuiHXsHuoNJTxtV/1tIdStcQEuiiIghEAaM43TxpQY7rxGJCzsGPWEj14ZNAj0K2xjQOEcCAoxWpL2kvvKaF0vM5NhqWWp1DdVEVs49w4te0L1qQ3kNAmAFkQEkz8QcpP/ToIg93ZVdnGyYIBgOqmKgqrCzMApMeOmBxkrfS3mPnlXDL3o4yf+dWU9JTOoBLsEMjMMAyXTLagR4PRxQDqxym0gzo2cdXB/a5cK79GoLtFpJWYniOhFhCm1a2s06NHRw1euDCQdGcaSikVhMF3zujDvkpV1tp+ZsV9AN62ztZnk9mGmqk9y+AOEBpSXuq90WCU66fXX7BnKpsrCKNwFoD2cbzzJjMPcNJLlgH8K+kl7wvC4F5r7cVO3DcYPL8Ull70k/7Dw78dxrFGbqKW7EieSHxwj4tIYZxb38wms2UudZWsVnp77OKPBNLGzAdBiK3YFsXqUwBrACC9PJ39OgIPt4R4ZO0jpIv6SWvtsrOMSS8c3XDUMQDEz8c7nHPfdOJgrZ1BoC8AeE5cRiBlbufrIhM9drwNue3cbUimM3P89aN/e/QJY83947YoSr9quk3vKWNQPpN/H0CY8BLvOHFzScgSEZx17zHxXGPNGl3U6/wOf1pduu64KOtW1o0ZHmZWiWJitiqqbWeDISJrus2Np4xByc4kws0h/A7/+lpce0krPWCd9UVkaiFTeLpULYXE1OSsu4aI+jTrt5y4fTOmzNjet7bvlEKZX5WfVg7LS6zY7wJoO2eTz2qx6TE7uZ3hetyJJj/VmQKBWCt9bVANHnJw0xIq8ZKnvVdiG/vW2icdXIti9R8RqXPifAiOAqhopYcBkLGmXiBpAA3jHO3TvfO4bJUHTh6txxwUcytyC6tx9QZn3cWseFkqkWoLwmC7Zj1gnZ0KQuCca7iQNMDg7e73buk5Z3sAaMw2vu3ENTm4orHGIyLvq37FBxARSF0IDBE9fgzmZA2OvWy4vR771+4vr1+1frXH3i1EhHJY/jkzf2ydrQehZp3Nf00Qq5RaLFvlAQDIr8xjtGv0/NYxiWICUXeE6fdM94eHhp8B0BjbeJFiVbLO5iZaVxWrXbbHLjp51TPhhZXf6aO2+ctkqNv1t524TqXU3NjEi88DwhLR6wDe0krviJ+PewFAFRVst/16G7QzxL48p0ITZqyzs4moBYImZm7VWjdBILGNB0WkD8CgiHzOxAN5Px8MdQ2d90rvf5LI8eoQ7hrVAAAAAElFTkSuQmCC
[btb-link]: https://ki.se/forskning/barntumorbanken-0
Expand Down
44 changes: 22 additions & 22 deletions conf/igenomes.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,32 @@
params {
genomes {
'GRCh37' {
acLoci = "${params.igenomes_base}/Annotation/ASCAT/1000G_phase3_20130502_SNP_maf0.3.loci"
acLociGC = "${params.igenomes_base}/Annotation/ASCAT/1000G_phase3_20130502_SNP_maf0.3.loci.gc"
bwaIndex = "${params.igenomes_base}/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.{amb,ann,bwt,pac,sa}"
dbsnp = "${params.igenomes_base}/Annotation/GATKBundle/dbsnp_138.b37.vcf"
dbsnpIndex = "${params.igenomes_base}/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx"
genomeDict = "${params.igenomes_base}/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.dict"
genomeFile = "${params.igenomes_base}/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta"
genomeIndex = "${params.igenomes_base}/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.fai"
intervals = "${params.igenomes_base}/Annotation/intervals/wgs_calling_regions_CAW.list"
knownIndels = "${params.igenomes_base}/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
knownIndelsIndex = "${params.igenomes_base}/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
acLoci = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/ASCAT/1000G_phase3_20130502_SNP_maf0.3.loci"
acLociGC = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/ASCAT/1000G_phase3_20130502_SNP_maf0.3.loci.gc"
bwaIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/BWAIndex/human_g1k_v37_decoy.fasta.{amb,ann,bwt,pac,sa}"
dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf"
dbsnpIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx"
genomeDict = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.dict"
genomeFile = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta"
genomeIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/WholeGenomeFasta/human_g1k_v37_decoy.fasta.fai"
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/intervals/wgs_calling_regions_CAW.list"
knownIndels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf"
knownIndelsIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/{1000G_phase1,Mills_and_1000G_gold_standard}.indels.b37.vcf.idx"
snpeffDb = "GRCh37.75"
vepCacheVersion = "95"
}
'GRCh38' {
acLoci = "${params.igenomes_base}/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci"
acLociGC = "${params.igenomes_base}/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci.gc"
bwaIndex = "${params.igenomes_base}/Sequence/BWAIndex/Homo_sapiens_assembly38.fasta.64.{alt,amb,ann,bwt,pac,sa}"
dbsnp = "${params.igenomes_base}/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz"
dbsnpIndex = "${params.igenomes_base}/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz.tbi"
genomeDict = "${params.igenomes_base}/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.dict"
genomeFile = "${params.igenomes_base}/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta"
genomeIndex = "${params.igenomes_base}/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta.fai"
intervals = "${params.igenomes_base}/Annotation/intervals/wgs_calling_regions.hg38.bed"
knownIndels = "${params.igenomes_base}/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
knownIndelsIndex = "${params.igenomes_base}/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
acLoci = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci"
acLociGC = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/ASCAT/1000G_phase3_GRCh38_maf0.3.loci.gc"
bwaIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/BWAIndex/Homo_sapiens_assembly38.fasta.64.{alt,amb,ann,bwt,pac,sa}"
dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz"
dbsnpIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/dbsnp_146.hg38.vcf.gz.tbi"
genomeDict = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.dict"
genomeFile = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta"
genomeIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta.fai"
intervals = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/intervals/wgs_calling_regions.hg38.bed"
knownIndels = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz"
knownIndelsIndex = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/{Mills_and_1000G_gold_standard.indels.hg38,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi"
snpeffDb = "GRCh38.86"
vepCacheVersion = "95"
}
Expand Down
17 changes: 12 additions & 5 deletions docs/annotation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Tools

With Sarek, annotation is done using `snpEff`, `VEP`, or even both consecutively:

- `--tools snpEff`
- To annotate using `snpEff`
- `--tools VEP`
Expand Down Expand Up @@ -32,6 +33,7 @@ You need to specify the cache directory using `--snpEff_cache` and `--vep_cache`
The cache will only be used when `--annotation_cache` and cache directories are specified (either in command lines or in a configuration file).

Example:

```bash
nextflow run nf-core/sarek/main.nf --tools snpEff --step annotate --sample file.vcf.gz --snpEff_cache /Path/To/snpEffCache --annotation_cache
nextflow run nf-core/sarek/main.nf --tools VEP --step annotate --sample file.vcf.gz --vep_cache /Path/To/vepCache --annotation_cache
Expand All @@ -40,11 +42,13 @@ nextflow run nf-core/sarek/main.nf --tools VEP --step annotate --sample file.vcf
## Using VEP CADD plugin

To enable the use of the VEP CADD plugin:
- Download the CADD files
- Specify them (either on the command line, like in the example or in a configuration file)
- use the `--cadd_cache` flag

- Download the CADD files
- Specify them (either on the command line, like in the example or in a configuration file)
- use the `--cadd_cache` flag

Example:

```bash
nextflow run nf-core/sarek/main.nf --step annotate --tools VEP --sample file.vcf.gz --cadd_cache \
--cadd_InDels /PathToCADD/InDels.tsv.gz \
Expand All @@ -57,16 +61,19 @@ nextflow run nf-core/sarek/main.nf --step annotate --tools VEP --sample file.vcf

An helper script has been designed to help downloading CADD files.
Such files are meant to be share between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.

```bash
nextflow run build.nf --cadd_cache /Path/To/CADDcache --cadd_version <CADD version> --genome <GENOME>
```

## Using VEP GeneSplicer plugin

To enable the use of the VEP GeneSplicer plugin:
- use the `--genesplicer` flag

- use the `--genesplicer` flag

Example:
```

```bash
nextflow run annotate.nf --tools VEP --sample file.vcf.gz --genesplicer
```
Loading