Suppr超能文献

使用表面匹配和监督式机器学习进行蛋白质对接

Protein docking using surface matching and supervised machine learning.

作者信息

Bordner Andrew J, Gorin Andrey A

机构信息

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6173, USA.

出版信息

Proteins. 2007 Aug 1;68(2):488-502. doi: 10.1002/prot.21406.

Abstract

Computational prediction of protein complex structures through docking offers a means to gain a mechanistic understanding of protein interactions that mediate biological processes. This is particularly important as the number of experimentally determined structures of isolated proteins exceeds the number of structures of complexes. A comprehensive docking procedure is described in which efficient sampling of conformations is achieved by matching surface normal vectors, fast filtering for shape complementarity, clustering by RMSD, and scoring the docked conformations using a supervised machine learning approach. Contacting residue pair frequencies, residue propensities, evolutionary conservation, and shape complementarity score for each docking conformation are used as input data to a Random Forest classifier. The performance of the Random Forest approach for selecting correctly docked conformations was assessed by cross-validation using a nonredundant benchmark set of X-ray structures for 93 heterodimer and 733 homodimer complexes. The single highest rank docking solution was the correct (near-native) structure for slightly more than one third of the complexes. Furthermore, the fraction of highly ranked correct structures was significantly higher than the overall fraction of correct structures, for almost all complexes. A detailed analysis of the difficult to predict complexes revealed that the majority of the homodimer cases were explained by incorrect oligomeric state annotation. Evolutionary conservation and shape complementarity score as well as both underrepresented and overrepresented residue types and residue pairs were found to make the largest contributions to the overall prediction accuracy. Finally, the method was also applied to docking unbound subunit structures from a previously published benchmark set.

摘要

通过对接进行蛋白质复合物结构的计算预测为深入理解介导生物过程的蛋白质相互作用机制提供了一种手段。随着已通过实验确定的分离蛋白质结构数量超过复合物结构数量,这一点尤为重要。本文描述了一种全面的对接程序,该程序通过匹配表面法向量实现构象的高效采样,通过形状互补性进行快速筛选,通过均方根偏差(RMSD)进行聚类,并使用监督式机器学习方法对对接构象进行评分。每个对接构象的接触残基对频率、残基倾向、进化保守性和形状互补性得分用作随机森林分类器的输入数据。使用93个异源二聚体和733个同源二聚体复合物的非冗余X射线结构基准集,通过交叉验证评估了随机森林方法选择正确对接构象的性能。对于略多于三分之一的复合物,排名最高的单个对接解决方案是正确的(接近天然)结构。此外,对于几乎所有复合物,排名靠前的正确结构比例显著高于正确结构的总体比例。对难以预测的复合物进行的详细分析表明,大多数同源二聚体情况可归因于错误的寡聚状态注释。发现进化保守性和形状互补性得分以及代表性不足和过度代表性的残基类型和残基对,对整体预测准确性贡献最大。最后,该方法还应用于对接先前发表的基准集中未结合的亚基结构。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验