Unidade de Endocrinologia do Desenvolvimento / LIM42 / SELA, Disciplina de Endocrinologia, Hospital das Clinicas (HCFMUSP), Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, SP, BR.
Division of Metabolism, Department of Internal Medicine, Endocrinology and Diabetes, University of Michigan, Ann Arbor, United States of America.
Clinics (Sao Paulo). 2021 Jan 22;76:e2052. doi: 10.6061/clinics/2021/e2052. eCollection 2021.
Single nucleotide variants (SNVs) are the most common type of genetic variation among humans. High-throughput sequencing methods have recently characterized millions of SNVs in several thousand individuals from various populations, most of which are benign polymorphisms. Identifying rare disease-causing SNVs remains challenging, and often requires functional in vitro studies. Prioritizing the most likely pathogenic SNVs is of utmost importance, and several computational methods have been developed for this purpose. However, these methods are based on different assumptions, and often produce discordant results. The aim of the present study was to evaluate the performance of 11 widely used pathogenicity prediction tools, which are freely available for identifying known pathogenic SNVs: Fathmn, Mutation Assessor, Protein Analysis Through Evolutionary Relationships (Phanter), Sorting Intolerant From Tolerant (SIFT), Mutation Taster, Polymorphism Phenotyping v2 (Polyphen-2), Align Grantham Variation Grantham Deviation (Align-GVGD), CAAD, Provean, SNPs&GO, and MutPred.
We analyzed 40 functionally proven pathogenic SNVs in four different genes associated with differences in sex development (DSD): 17β-hydroxysteroid dehydrogenase 3 (HSD17B3), steroidogenic factor 1 (NR5A1), androgen receptor (AR), and luteinizing hormone/chorionic gonadotropin receptor (LHCGR). To evaluate the false discovery rate of each tool, we analyzed 36 frequent (MAF>0.01) benign SNVs found in the same four DSD genes. The quality of the predictions was analyzed using six parameters: accuracy, precision, negative predictive value (NPV), sensitivity, specificity, and Matthews correlation coefficient (MCC). Overall performance was assessed using a receiver operating characteristic (ROC) curve.
Our study found that none of the tools were 100% precise in identifying pathogenic SNVs. The highest specificity, precision, and accuracy were observed for Mutation Assessor, MutPred, SNP, and GO. They also presented the best statistical results based on the ROC curve statistical analysis. Of the 11 tools evaluated, 6 (Mutation Assessor, Phanter, SIFT, Mutation Taster, Polyphen-2, and CAAD) exhibited sensitivity >0.90, but they exhibited lower specificity (0.42-0.67). Performance, based on MCC, ranged from poor (Fathmn=0.04) to reasonably good (MutPred=0.66).
Computational algorithms are important tools for SNV analysis, but their correlation with functional studies not consistent. In the present analysis, the best performing tools (based on accuracy, precision, and specificity) were Mutation Assessor, MutPred, and SNPs&GO, which presented the best concordance with functional studies.
单核苷酸变异(SNV)是人类中最常见的遗传变异类型。最近,高通量测序方法已经在来自不同人群的数千个人中描述了数百万个 SNV,其中大多数是良性多态性。识别罕见的致病 SNV 仍然具有挑战性,通常需要进行功能体外研究。优先考虑最可能的致病性 SNV 至关重要,为此已经开发了几种计算方法。然而,这些方法基于不同的假设,并且经常产生不一致的结果。本研究的目的是评估 11 种广泛使用的致病性预测工具的性能,这些工具可免费用于识别已知致病性 SNV:Fathmn、Mutation Assessor、Protein Analysis Through Evolutionary Relationships(Phanter)、Sorting Intolerant From Tolerant(SIFT)、Mutation Taster、Polymorphism Phenotyping v2(Polyphen-2)、Align Grantham Variation Grantham Deviation(Align-GVGD)、CAAD、Provean、SNPs&GO 和 MutPred。
我们分析了四个与性别发育差异(DSD)相关的不同基因中 40 个功能上已证实的致病性 SNV:17β-羟类固醇脱氢酶 3(HSD17B3)、类固醇生成因子 1(NR5A1)、雄激素受体(AR)和促黄体激素/绒毛膜促性腺激素受体(LHCGR)。为了评估每个工具的假发现率,我们分析了在相同的四个 DSD 基因中发现的 36 个常见(MAF>0.01)良性 SNV。使用六个参数分析预测的质量:准确性、精度、阴性预测值(NPV)、灵敏度、特异性和马修斯相关系数(MCC)。使用接收器工作特征(ROC)曲线评估整体性能。
我们的研究发现,没有一种工具可以 100%准确地识别致病性 SNV。Mutation Assessor、MutPred、SNP 和 GO 具有最高的特异性、精度和准确性。它们还基于 ROC 曲线统计分析提供了最佳的统计结果。在评估的 11 种工具中,有 6 种(Mutation Assessor、Phanter、SIFT、Mutation Taster、Polyphen-2 和 CAAD)的灵敏度>0.90,但特异性较低(0.42-0.67)。基于 MCC 的性能范围从较差(Fathmn=0.04)到相当好(MutPred=0.66)。
计算算法是 SNV 分析的重要工具,但它们与功能研究的相关性不一致。在本分析中,表现最好的工具(基于准确性、精度和特异性)是 Mutation Assessor、MutPred 和 SNPs&GO,它们与功能研究的一致性最好。