Jin Jingjie, Chen Zixi, Liu Jinchao, Du Hongli, Zhang Gong
Key Laboratory of Functional Protein Research, Guangdong Higher Education Institutes, Jinan University, Guangzhou, China.
MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China.
Front Genet. 2022 Nov 15;13:979928. doi: 10.3389/fgene.2022.979928. eCollection 2022.
Accurate and robust somatic mutation detection is essential for cancer treatment, diagnostics and research. Various analysis pipelines give different results and thus should be systematically evaluated. In this study, we benchmarked 5 commonly-used somatic mutation calling pipelines (VarScan, VarDictJava, Mutect2, Strelka2 and FANSe) for their precision, recall and speed, using standard benchmarking datasets based on a series of real-world whole-exome sequencing datasets. All the 5 pipelines showed very high precision in all cases, and high recall rate in mutation rates higher than 10%. However, for the low frequency mutations, these pipelines showed large difference. FANSe showed the highest accuracy (especially the sensitivity) in all cases, and VarScan and VarDictJava outperformed Mutect2 and Strelka2 in low frequency mutations at all sequencing depths. The flaws in filter was the major cause of the low sensitivity of the four pipelines other than FANSe. Concerning the speed, FANSe pipeline was 8.8∼19x faster than the other pipelines. Our benchmarking results demonstrated performance of the somatic calling pipelines and provided a reference for a proper choice of such pipelines in cancer applications.
准确且稳健的体细胞突变检测对于癌症治疗、诊断和研究至关重要。各种分析流程会给出不同的结果,因此应该进行系统评估。在本研究中,我们使用基于一系列真实世界全外显子组测序数据集的标准基准数据集,对5种常用的体细胞突变检测流程(VarScan、VarDictJava、Mutect2、Strelka2和FANSe)的精度、召回率和速度进行了基准测试。所有这5种流程在所有情况下都显示出非常高的精度,并且在突变率高于10%时具有高召回率。然而,对于低频突变,这些流程表现出很大差异。FANSe在所有情况下都显示出最高的准确性(尤其是灵敏度),并且在所有测序深度下,VarScan和VarDictJava在低频突变方面优于Mutect2和Strelka2。除FANSe外,其他四种流程灵敏度低的主要原因是过滤存在缺陷。在速度方面,FANSe流程比其他流程快8.8至19倍。我们的基准测试结果展示了体细胞检测流程的性能,并为在癌症应用中正确选择此类流程提供了参考。