Department of Tumor Biology, Institute of Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, 0310, Oslo, Norway.
DNV GL, 1363, Høvik, Norway.
Sci Rep. 2020 Nov 19;10(1):20222. doi: 10.1038/s41598-020-77218-4.
Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, "synthetic-diploid" and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.
下一代测序技术的进步使得全基因组测序(WGS)能够广泛用于鉴定一系列遗传相关疾病的因果变异,并深入了解遗传多态性如何影响疾病表型。不同的生物信息学管道的发展不断改进了 WGS 数据的变异分析。然而,有必要对这些管道进行系统的性能比较,为基于 WGS 的科学和临床基因组学的应用提供指导。在这项研究中,我们使用基因组瓶联合会、“合成二倍体”和模拟 WGS 数据集评估了三种变异调用管道(GATK、DRAGEN 和 DeepVariant)的性能。DRAGEN 和 DeepVariant 在 SNP 和 indel 调用方面具有更好的准确性,其 F1 分数没有显著差异。DRAGEN 平台在准确性、灵活性和高效的执行速度方面表现出色,因此在大规模的 WGS 数据分析方面具有优越的性能。DRAGEN 和 DeepVariant 的结合也为种系变异检测提供了一种准确性和效率的良好平衡,是进一步应用中替代解决方案。我们的结果促进了生物信息学管道基准分析的标准化,以实现可靠的变异检测,这在基于遗传学的医学研究和临床应用中至关重要。