Computational Biology & Bioinformatics - i12, Informatics, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching/Munich, Germany.
Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ, 08901, USA.
Sci Rep. 2017 May 9;7(1):1608. doi: 10.1038/s41598-017-01054-2.
Any two unrelated individuals differ by about 10,000 single amino acid variants (SAVs). Do these impact molecular function? Experimental answers cannot answer comprehensively, while state-of-the-art prediction methods can. We predicted the functional impacts of SAVs within human and for variants between human and other species. Several surprising results stood out. Firstly, four methods (CADD, PolyPhen-2, SIFT, and SNAP2) agreed within 10 percentage points on the percentage of rare SAVs predicted with effect. However, they differed substantially for the common SAVs: SNAP2 predicted, on average, more effect for common than for rare SAVs. Given the large ExAC data sets sampling 60,706 individuals, the differences were extremely significant (p-value < 2.2e-16). We provided evidence that SNAP2 might be closer to reality for common SAVs than the other methods, due to its different focus in development. Secondly, we predicted significantly higher fractions of SAVs with effect between healthy individuals than between species; the difference increased for more distantly related species. The same trends were maintained for subsets of only housekeeping proteins and when moving from exomes of 1,000 to 60,000 individuals. SAVs frozen at speciation might maintain protein function, while many variants within a species might bring about crucial changes, for better or worse.
任意两个不相关的个体之间大约有 10000 个单氨基酸变异 (SAVs)。这些变异会影响分子功能吗?实验答案无法全面回答这个问题,而最先进的预测方法可以。我们预测了人类内部 SAV 以及人类与其他物种之间变异的功能影响。有几个令人惊讶的结果脱颖而出。首先,四种方法 (CADD、PolyPhen-2、SIFT 和 SNAP2) 在预测具有影响的稀有 SAV 百分比方面,误差率在 10%以内。然而,它们对常见 SAV 的差异很大:SNAP2 平均预测,常见的 SAV 比稀有 SAV 更具影响。鉴于从 60706 个人中抽样的 ExAC 大型数据集,这些差异非常显著 (p 值 < 2.2e-16)。我们提供的证据表明,由于 SNAP2 在开发过程中的重点不同,对于常见的 SAV 来说,它可能比其他方法更接近现实。其次,我们预测健康个体之间具有影响的 SAV 比例明显高于物种之间;对于亲缘关系较远的物种,差异增大。对于仅维持蛋白和从 1000 个个体的外显子组移动到 60000 个个体的外显子组的子集,也保持了相同的趋势。在物种形成时冻结的 SAV 可能维持蛋白功能,而一个物种内的许多变异可能会带来更好或更差的关键变化。