Thomas John, Ramakrishnan Naren, Bailey-Kellogg Chris
Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755, USA.
Proteins. 2009 Sep;76(4):911-29. doi: 10.1002/prot.22398.
Protein-protein interactions are mediated by complementary amino acids defining complementary surfaces. Typically not all members of a family of related proteins interact equally well with all members of a partner family; thus analysis of the sequence record can reveal the complementary amino acid partners that confer interaction specificity. This article develops methods for learning and using probabilistic graphical models of such residue "cross-coupling" constraints between interacting protein families, based on multiple sequence alignments and information about which pairs of proteins are known to interact. Our models generalize traditional consensus sequence binding motifs, and provide a probabilistic semantics enabling sound evaluation of the plausibility of new possible interactions. Furthermore, predictions made by the models can be explained in terms of the underlying residue interactions. Our approach supports different levels of prior knowledge regarding interactions, including both one-to-one (e.g., pairs of proteins from the same organism) and many-to-many (e.g., experimentally identified interactions), and we present a technique to account for possible bias in the represented interactions. We apply our approach in studies of PDZ domains and their ligands, fundamental building blocks in a number of protein assemblies. Our algorithms are able to identify biologically interesting cross-coupling constraints, to successfully identify known interactions, and to make explainable predictions about novel interactions.
蛋白质-蛋白质相互作用是由定义互补表面的互补氨基酸介导的。通常,相关蛋白质家族的所有成员与伙伴家族的所有成员之间的相互作用并非同样良好;因此,对序列记录的分析可以揭示赋予相互作用特异性的互补氨基酸伙伴。本文基于多序列比对以及已知相互作用的蛋白质对信息,开发了用于学习和使用相互作用蛋白质家族之间这种残基“交叉耦合”约束的概率图形模型的方法。我们的模型推广了传统的共有序列结合基序,并提供了一种概率语义,能够合理评估新的可能相互作用的合理性。此外,模型所做的预测可以根据潜在的残基相互作用来解释。我们的方法支持关于相互作用的不同程度的先验知识,包括一对一(例如,来自同一生物体的蛋白质对)和多对多(例如,实验确定的相互作用),并且我们提出了一种技术来考虑所表示的相互作用中可能存在的偏差。我们将我们的方法应用于对PDZ结构域及其配体的研究,PDZ结构域及其配体是许多蛋白质组装体中的基本组成部分。我们的算法能够识别生物学上有趣的交叉耦合约束,成功识别已知的相互作用,并对新的相互作用做出可解释的预测。