Grigoryan Gevorg, Keating Amy E
MIT Department of Biology, Cambridge, MA 02139, USA.
J Mol Biol. 2006 Feb 3;355(5):1125-42. doi: 10.1016/j.jmb.2005.11.036. Epub 2005 Dec 1.
Predicting protein interaction specificity from sequence is an important goal in computational biology. We present a model for predicting the interaction preferences of coiled-coil peptides derived from bZIP transcription factors that performs very well when tested against experimental protein microarray data. We used only sequence information to build atomic-resolution structures for 1711 dimeric complexes, and evaluated these with a variety of functions based on physics, learned empirical weights or experimental coupling energies. A purely physical model, similar to those used for protein design studies, gave reasonable performance. The results were improved significantly when helix propensities were used in place of a structurally explicit model to represent the unfolded reference state. Further improvement resulted upon accounting for residue-residue interactions in competing states in a generic way. Purely physical structure-based methods had difficulty capturing core interactions accurately, especially those involving polar residues such as asparagine. When these terms were replaced with weights from a machine-learning approach, the resulting model was able to correctly order the stabilities of over 6000 pairs of complexes with greater than 90% accuracy. The final model is physically interpretable, and suggests specific pairs of residues that are important for bZIP interaction specificity. Our results illustrate the power and potential of structural modeling as a method for predicting protein interactions and highlight obstacles that must be overcome to reach quantitative accuracy using a de novo approach. Our method shows unprecedented performance in predicting protein-protein interaction specificity accurately using structural modeling and suggests that predicting coiled-coil interactions generally may be within reach.
从序列预测蛋白质相互作用特异性是计算生物学的一个重要目标。我们提出了一个用于预测源自bZIP转录因子的卷曲螺旋肽相互作用偏好的模型,该模型在针对实验性蛋白质微阵列数据进行测试时表现出色。我们仅使用序列信息构建了1711个二聚体复合物的原子分辨率结构,并基于物理、学习到的经验权重或实验耦合能量用多种函数对这些结构进行了评估。一个类似于用于蛋白质设计研究的纯物理模型给出了合理的性能。当使用螺旋倾向代替结构明确的模型来表示未折叠参考状态时,结果有了显著改善。以通用方式考虑竞争状态下的残基 - 残基相互作用进一步提高了性能。基于纯物理结构的方法难以准确捕捉核心相互作用,尤其是那些涉及极性残基(如天冬酰胺)的相互作用。当用机器学习方法的权重替换这些项时,所得模型能够以大于90%的准确率正确排列6000多对复合物的稳定性顺序。最终模型具有物理可解释性,并指出了对bZIP相互作用特异性很重要的特定残基对。我们的结果说明了结构建模作为预测蛋白质相互作用方法的能力和潜力,并突出了使用从头开始方法达到定量准确性必须克服的障碍。我们的方法在使用结构建模准确预测蛋白质 - 蛋白质相互作用特异性方面显示出前所未有的性能,并表明预测一般的卷曲螺旋相互作用可能是可行的。