Leung Henry C M, Xiang Qian, Yiu S M, Chin Francis Y L
Department of Computer Science, University of Hong Kong, Hong Kong.
J Comput Biol. 2009 Feb;16(2):133-44. doi: 10.1089/cmb.2008.01TT.
Protein complexes play a critical role in many biological processes. Identifying the component proteins in a protein complex is an important step in understanding the complex as well as the related biological activities. This paper addresses the problem of predicting protein complexes from the protein-protein interaction (PPI) network of one species using a computational approach. Most of the previous methods rely on the assumption that proteins within the same complex would have relatively more interactions. This translates into dense subgraphs in the PPI network. However, the existing software tools have limited success. Recently, Gavin et al. (2006) provided a detailed study on the organization of protein complexes and suggested that a complex consists of two parts: a core and an attachment. Based on this core-attachment concept, we developed a novel approach to identify complexes from the PPI network by identifying their cores and attachments separately. We evaluated the effectiveness of our proposed approach using three different datasets and compared the quality of our predicted complexes with three existing tools. The evaluation results show that we can predict many more complexes and with higher accuracy than these tools with an improvement of over 30%. To verify the cores we identified in each complex, we compared our cores with the mediators produced by Andreopoulos et al. (2007), which were claimed to be the cores, based on the benchmark result produced by Gavin et al. (2006). We found that the cores we produced are of much higher quality ranging from 10- to 30-fold more correctly predicted cores and with better accuracy.
蛋白质复合物在许多生物过程中发挥着关键作用。识别蛋白质复合物中的组成蛋白质是理解该复合物以及相关生物活性的重要一步。本文采用计算方法解决了从一个物种的蛋白质 - 蛋白质相互作用(PPI)网络预测蛋白质复合物的问题。以前的大多数方法都依赖于这样一种假设,即同一复合物中的蛋白质会有相对更多的相互作用。这在PPI网络中表现为密集子图。然而,现有的软件工具取得的成功有限。最近,加文等人(2006年)对蛋白质复合物的组织进行了详细研究,并提出一个复合物由两部分组成:核心和附属物。基于这种核心 - 附属物概念,我们开发了一种新方法,通过分别识别蛋白质复合物的核心和附属物来从PPI网络中识别复合物。我们使用三个不同的数据集评估了我们提出的方法的有效性,并将我们预测的复合物的质量与三个现有工具进行了比较。评估结果表明,我们能够预测出比这些工具更多的复合物,并且准确率更高,提高了超过30%。为了验证我们在每个复合物中识别出的核心,我们将我们的核心与安德烈奥普洛斯等人(2007年)产生的媒介物进行了比较,基于加文等人(2006年)产生的基准结果,那些媒介物被认为是核心。我们发现我们产生的核心质量要高得多,正确预测的核心数量多出10到30倍,并且准确率更高。