Zeng Erliang, Ding Chris, Narasimhan Giri, Holbrook Stephen R
Bioinformatics Research Group, School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.
Comput Syst Bioinformatics Conf. 2008;7:73-84.
Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.
几乎每个细胞过程都需要蛋白质对或更大的蛋白质复合物之间的相互作用。利用酵母双杂交系统、质谱法等技术已经产生了高通量蛋白质-蛋白质相互作用(PPI)数据。这些数据为我们预测蛋白质功能和构建蛋白质-蛋白质相互作用网络提供了新的视角,并且为此目的已经开发了许多最新算法。然而,使用高通量技术生成的PPI数据包含大量假阳性。在本文中,我们提出了一种基于基因本体信息评估PPI数据支持度的新方法。如果使用基因本体信息并使用雷斯尼克公式计算基因之间的语义相似性,那么我们的结果表明,我们可以将PPI数据建模为一个混合模型,该模型基于这样的假设:真正的蛋白质-蛋白质相互作用在数据中比假阳性具有更高的支持度。因此,基因之间的语义相似性作为PPI数据支持度的一个度量。更进一步,借助所提出的PPI数据支持度度量,也正在提出新的功能预测方法。这些新的功能预测方法优于传统方法。还提出了新的评估方法。