Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA.
Bioinformatics. 2012 Aug 15;28(16):2093-6. doi: 10.1093/bioinformatics/bts336. Epub 2012 Jun 8.
Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10,000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10,913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change.
定点突变经常被科学家用于在实验室中研究氨基酸突变对功能的影响。在 UniProt 数据库中,已经报道了超过 10000 种这样的实验室诱导突变,以及功能测定的结果。在这里,我们探讨了最先进的计算工具(Condel、PolyPhen-2 和 SIFT)在正确注释 2372 种蛋白质中的 10913 种实验室诱导突变的功能改变潜力方面的性能。我们发现,计算工具在诊断实验室中引起显著功能变化的诱导突变方面非常成功(准确率高达 92%)。但是,这些工具在正确注释实验室诱导突变方面始终存在问题,这些突变在实验室测定中没有表现出功能影响。因此,计算工具对实验室诱导突变的整体准确性远低于对自然发生的人类变异的准确性。我们测试并排除了以下可能性:即丙氨酸的大量变化和实验室中存在的多个碱基对突变是导致计算工具对自然和实验室突变性能不一致的原因。相反,我们发现实验室诱导的突变主要发生在蛋白质高度保守的位置,在这些位置上,计算工具对不影响功能的变异(中性)的正确预测准确性最低。因此,实验分析结果与计算预测结果的比较需要对携带氨基酸变化的位置的进化保守性敏感。