Maxime Garcia 1, Szilveszter Juhos 2, Malin Larsson 3, Teresita Díaz de Ståhl 4, Jesper Eisfeldt 5, Sebastian DiLorenzo 6, Pall Olason 7, Björn Nystedt 7, Monica Nistér 4, Max Käller 8
- BarnTumörBanken, Department of Oncology Pathology, Science for Life Laboratory, Karolinska Institutet
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University
- Science for Life laboratory, Department of Physics, Chemistry and Biology, Linköping University
- BarnTumörBanken, Department of Oncology Pathology, Karolinska Institutet
- Clinical Genetics, Department of Molecular Medicine and Surgery, Karolinska Institutet
- Department of Medical Sciences, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University
- Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Royal Institute of Technology
As whole genome sequencing is getting cheaper, it is viable to compare NGS data from normal and tumor samples of numerous patients. There are still many challenges, mostly regarding bioinformatics: datasets are huge, workflows are complex, and there are multiple tools to choose from for somatic and structural variants and quality control.
We are presenting CAW (Cancer Analysis Workflow) a complete open source pipeline to resolve somatic variants from WGS data: it is written in Nextflow, a domain specific language for workflow building. We are utilizing GATK best practices to align, realign and recalibrate short-read data in parallel for both tumor and normal sample. After these preprocessing steps several somatic variant callers scan the resulting BAM files; MuTect1, MuTect2 and Strelka are used to find somatic SNVs and small indels.For structural variants we use Manta. Furthermore, we are applying ASCAT to estimate sample heterogeneity, ploidy and CNVs.
The software can start the analysis from raw FASTQ files, from the realignment step, or directly with any subset of variant callers. At the end of the analysis the resulting VCF files are merged to facilitate further downstream processing, though the individual results are also retained. The flow is capable of accommodating further variant calling software or CNV callers. It is also prepared to process normal - tumor - and several relapse samples.
Besides variant calls, the workflow provides quality controls presented by MultiQC. A docker image is also available, the open source software can be downloaded from https://github.com/SciLifeLab/CAW .