Kamisetty Hetunandan, Ghosh Bornika, Langmead Christopher James, Bailey-Kellogg Chris
1Facebook Inc., Seattle, Washington.
3Department of Computer Science, Dartmouth, Hanover, New Hampshire.
J Comput Biol. 2015 Jun;22(6):474-86. doi: 10.1089/cmb.2014.0289. Epub 2015 May 14.
In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (data-driven graphical models of specificity in protein:protein interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the ΔG of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted ΔG values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level ΔG predictions to be naturally understood in terms of residue-level constraints. Finally, the model DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.
在研究两个蛋白质家族成员之间相互作用的强度和特异性时,关键问题集中在哪些可能的伙伴对实际发生相互作用、它们的相互作用程度如何,以及为什么它们会相互作用而其他伙伴对却不会。对目标家族成员与各种可能的相互作用伙伴之间相互作用进行大规模实验研究的出现,为解决这些问题提供了机会。我们在此开发了一种方法,即DgSpi(蛋白质-蛋白质相互作用特异性的数据驱动图形模型),用于学习和使用图形模型,这些模型明确表示相互作用特异性的氨基酸基础(原因),并扩展早期基于分类的方法(哪些)来预测结合的ΔG(程度如何)。基于MacBeath及其同事的数据,我们证明了我们的方法在分析和预测82个PDZ识别模块与217个可能的肽伙伴组成的面板之间的相互作用时的有效性。我们预测的ΔG值对实验测量值具有高度预测性,在10折交叉验证中相关系数达到0.69,在留一PDZ交叉验证中达到0.63。此外,该模型可作为相互作用背后氨基酸限制的紧凑表示,使蛋白质水平的ΔG预测能够根据残基水平的限制自然地得到理解。最后,DgSpi模型很容易实现新相互作用伙伴的设计,并且我们证明设计的配体是新颖且多样的。