Department of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, Sweden.
Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, Solna, 17121, Sweden.
F1000Res. 2020 Jan 29;9:63. doi: 10.12688/f1000research.16665.2. eCollection 2020.
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.
全基因组测序(WGS)是推进精准医学研究的一项基础技术,但由于缺乏便携式且易于使用的 WGS 分析工作流程,这给许多研究团队带来了重大挑战,并阻碍了科学的发展。在这里,我们介绍了 Sarek,这是一种基于 WGS、全外显子组测序(WES)或基因panel 测序数据检测种系变体和体细胞突变的开源工作流程。Sarek 具有以下特点:(i)易于安装,(ii)在不同的计算机环境中具有强大的可移植性,(iii)全面的文档,(iv)透明且易于阅读的代码,以及(v)广泛的质量指标报告。Sarek 是用 Nextflow 工作流程语言实现的,同时支持 Docker 和 Singularity 容器以及 Conda 环境,非常适合在任何符合 POSIX 标准的计算机和云计算环境中轻松部署。Sarek 遵循 GATK 的最佳实践建议进行读段比对和预处理,并包含了广泛的软件,用于种系和体细胞单核苷酸变体、插入和缺失变体、结构变体、肿瘤样本纯度以及倍性和拷贝数变异的识别和注释。Sarek 提供了简便、高效和可重复的 WGS 分析,既可以作为测序机构的生产工作流程使用,也可以作为单个研究团队的强大独立工具使用。Sarek 的源代码、文档和安装说明可在 https://github.com/nf-core/sarek 和 https://nf-co.re/sarek/ 免费获取。