Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China.
National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China.
Nucleic Acids Res. 2018 Sep 6;46(15):7793-7804. doi: 10.1093/nar/gky678.
With expanding applications of next-generation sequencing in medical genetics, increasing computational methods are being developed to predict the pathogenicity of missense variants. Selecting optimal methods can accelerate the identification of candidate genes. However, the performances of different computational methods under various conditions have not been completely evaluated. Here, we compared 12 performance measures of 23 methods based on three independent benchmark datasets: (i) clinical variants from the ClinVar database related to genetic diseases, (ii) somatic variants from the IARC TP53 and ICGC databases related to human cancers and (iii) experimentally evaluated PPARG variants. Some methods showed different performances under different conditions, suggesting that they were not always applicable for different conditions. Furthermore, the specificities were lower than the sensitivities for most methods (especially, for the experimentally evaluated benchmark datasets), suggesting that more rigorous cutoff values are necessary to distinguish pathogenic variants. Furthermore, REVEL, VEST3 and the combination of both methods (i.e. ReVe) showed the best overall performances with all the benchmark data. Finally, we evaluated the performances of these methods with de novo mutations, finding that ReVe consistently showed the best performance. We have summarized the performances of different methods under various conditions, providing tentative guidance for optimal tool selection.
随着下一代测序技术在医学遗传学中的应用不断扩大,越来越多的计算方法被开发出来,以预测错义变异的致病性。选择最佳的方法可以加速候选基因的鉴定。然而,不同计算方法在不同条件下的性能尚未得到完全评估。在这里,我们比较了基于三个独立基准数据集(i)ClinVar 数据库中与遗传疾病相关的临床变异,(ii)IARC TP53 和 ICGC 数据库中与人类癌症相关的体细胞变异,以及(iii)实验评估的 PPARG 变异的 23 种方法的 12 种性能指标。一些方法在不同条件下表现出不同的性能,这表明它们并不总是适用于不同的条件。此外,对于大多数方法(特别是对于实验评估的基准数据集),特异性低于敏感性,这表明需要更严格的截止值来区分致病性变异。此外,REVEL、VEST3 和这两种方法的组合(即 ReVe)在所有基准数据中表现出最佳的整体性能。最后,我们用从头突变评估了这些方法的性能,发现 ReVe 始终表现出最佳的性能。我们总结了不同方法在不同条件下的性能,为最佳工具选择提供了初步指导。