Gomez Shawn M, Noble William Stafford, Rzhetsky Andrey
Unité de Biochimie et Biologie Moléculaire des Insectes, Institut Pasteur, 75724 Paris Cedex 15, France.
Bioinformatics. 2003 Oct 12;19(15):1875-81. doi: 10.1093/bioinformatics/btg352.
In order to understand the molecular machinery of the cell, we need to know about the multitude of protein-protein interactions that allow the cell to function. High-throughput technologies provide some data about these interactions, but so far that data is fairly noisy. Therefore, computational techniques for predicting protein-protein interactions could be of significant value. One approach to predicting interactions in silico is to produce from first principles a detailed model of a candidate interaction. We take an alternative approach, employing a relatively simple model that learns dynamically from a large collection of data. In this work, we describe an attraction-repulsion model, in which the interaction between a pair of proteins is represented as the sum of attractive and repulsive forces associated with small, domain- or motif-sized features along the length of each protein. The model is discriminative, learning simultaneously from known interactions and from pairs of proteins that are known (or suspected) not to interact. The model is efficient to compute and scales well to very large collections of data. In a cross-validated comparison using known yeast interactions, the attraction-repulsion method performs better than several competing techniques.
为了理解细胞的分子机制,我们需要了解众多使细胞发挥功能的蛋白质-蛋白质相互作用。高通量技术提供了一些关于这些相互作用的数据,但到目前为止,这些数据相当嘈杂。因此,预测蛋白质-蛋白质相互作用的计算技术可能具有重要价值。一种在计算机上预测相互作用的方法是从第一原理出发构建候选相互作用的详细模型。我们采用了一种替代方法,使用一个相对简单的模型,该模型从大量数据中动态学习。在这项工作中,我们描述了一种吸引-排斥模型,其中一对蛋白质之间的相互作用表示为与沿着每个蛋白质长度的小的、结构域或基序大小的特征相关的吸引力和排斥力的总和。该模型具有判别性,能同时从已知的相互作用以及已知(或怀疑)不相互作用的蛋白质对中学习。该模型计算效率高,并且能很好地扩展到非常大的数据集合。在使用已知酵母相互作用进行的交叉验证比较中,吸引-排斥方法的表现优于几种竞争技术。