Skip to content

Latest commit

 

History

History
48 lines (43 loc) · 2.46 KB

File metadata and controls

48 lines (43 loc) · 2.46 KB

Keystone Symposia - Precision Medicine in Cancer

Sarek, a workflow for WGS analysis of germline and somatic mutations

Maxime Garcia 123*, Szilveszter Juhos 123*, Malin Larsson 456, Teresita Díaz de Ståhl 13, Johanna Sandgren 13, Jesper Eisfeldt 73, Sebastian DiLorenzo 85A, Marcel Martin B3C, Pall Olason 95A, Phil Ewels B2C, Björn Nystedt 95A*, Monica Nistér 13, Max Käller 2D, *Corresponding Author

  1. Barntumörbanken, Dept. of Oncology Pathology;
  2. Science for Life Laboratory;
  3. Karolinska Institutet;
  4. Dept. of Physics, Chemistry and Biology;
  5. National Bioinformatics Infrastructure Sweden, Science for Life Laboratory;
  6. Linköping University;
  7. Clinical Genetics, Dept. of Molecular Medicine and Surgery;
  8. Dept. of Medical Sciences;
  9. Dept. of Cell and Molecular Biology; A. Uppsala University; B. Dept. of Biochemistry and Biophysics; C. Stockholm University; D. School of Biotechnology, Division of Gene Technology, Royal Institute of Technology

We present Sarek, a complete Open Source pipeline to resolve germline and somatic variants from WGS data: it is written in Nextflow, a domain-specific language for workflow building. Sarek is based on GATK best practices to prepare short-read data, in parallel for a tumor/normal pair sample. After these preprocessing steps several variant callers scan the resulting BAM files; For structural variants we use Manta. Strelka and GATK HaplotypeCaller are used to find germline variants and for somatic calls we use MuTect2 and Strelka. Finally, we apply ASCAT to estimate sample heterogeneity, ploidy and CNVs. Checkpoints allow to start the software from different states. At the end of the analysis the resulting VCF files are annotated to facilitate further downstream processing. The flow is capable of accommodating further variant callers. It can also process only the normal sample, tumor/normal pairs or even normal, tumor and several relapse samples. Besides variant calls, the workflow provides quality controls presented by MultiQC. For easy sharing, installation, and to ensure reproducibility, containers (Docker and Singularity) are available. The MIT licensed open source code can be downloaded from GitHub.

The authors thank the Swedish Childhood Cancer Foundation for the funding of Barntumörbanken. We would like to acknowledge support from Science for Life Laboratory, the National Genomics Infrastructure, NGI, and UPPMAX for providing assistance in massive parallel sequencing and computational infrastructure.