Suppr超能文献

从生物组装体预测人类蛋白质中错义突变的表型。

Prediction of phenotypes of missense mutations in human proteins from biological assemblies.

机构信息

Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA.

出版信息

Proteins. 2013 Feb;81(2):199-213. doi: 10.1002/prot.24176. Epub 2012 Nov 5.

Abstract

Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins.

摘要

单核苷酸多态性(SNPs)是人类基因组中最常见的变异。导致错义突变的非同义 SNPs 可能是中性的或有害的,已经提出了几种计算方法来预测人类错义突变的表型。这些方法使用基于序列和基于结构的特征的各种组合,依赖于这些特征对于有害和中性突变的不同统计分布。一个尚未得到充分研究的基于结构的特征是在生物相关寡聚体组装体中的可及表面积。这些组装体与 X 射线晶体结构的结晶学不对称单位不同的超过一半。我们发现蛋白质核心或生物组装体界面中的突变比生物组装体表面上的突变更有可能与疾病相关。对于生物组装体中有一个以上蛋白质的结构(无论是相同序列还是不同序列),我们发现来自生物组装体的可及表面积与来自蛋白质晶体结构的单体的可及表面积相比(P = 6e-5)提供了统计学上显著的改进。当将此信息添加到序列基特征(例如野生型和突变位置特定轮廓分数之间的差异)中时,来自生物组装体的改进在统计学上是显著的,但要小得多(P = 0.018)。将此信息与支持向量机中的序列基特征相结合,可在包含 50%来自 SwissVar 的疾病相关突变和 50%来自同源蛋白的人类/灵长类序列差异的中性突变的平衡数据集上达到 82%的准确性。

相似文献

引用本文的文献

本文引用的文献

2
UniProt Knowledgebase: a hub of integrated protein data.UniProt 知识库:一个集成蛋白质数据的中心。
Database (Oxford). 2011 Mar 29;2011:bar009. doi: 10.1093/database/bar009. Print 2011.
9
Automated inference of molecular mechanisms of disease from amino acid substitutions.从氨基酸替换自动推断疾病的分子机制。
Bioinformatics. 2009 Nov 1;25(21):2744-50. doi: 10.1093/bioinformatics/btp528. Epub 2009 Sep 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验