School of Information Science and Engineering, Central South University, Changsha, 410005, China.
IEEE Trans Nanobioscience. 2012 Dec;11(4):324-35. doi: 10.1109/TNB.2012.2197863. Epub 2012 Jun 12.
High-throughput experimental technologies, along with computational predictions, have promoted the emergence of large-scale interactome for numerous organisms. Identification of protein complexes from these interactome networks is crucial to understand principles of cellular organization and predict protein functions. Protein complexes are generally considered as dense subgraphs. However, the real protein complexes do not always have highly connected topologies. In this paper, a novel protein complex identifying method, named EPOF, is proposed, using essential proteins and the local metric of vertex fitness. In EPOF, cliques in the subnetwork which is consisted by the essential proteins are firstly considered as seeds, which are ordered according to their size and the number of their neighbors. A protein complex is extended from a seed based on the evaluation of its neighbors' fitness value. Then, the similar procedure is applied to the cliques identified in the subnetwork which is consisted by the proteins which is not clustered in the first step. When EPOF identifies complexes by expanding essential protein cliques, the essential proteins have higher priority and lower threshold. When it identifies complexes by expanding nonessential protein cliques, the nonessential proteins have higher priority and lower threshold. Finally, the last step, we output the identified complexes set. The proposed algorithm EPOF is applied to the unweighted and weighted interaction networks of S. cerevisiae and detects many well known protein complexes. We compare the performances of EPOF to other ten previous algorithms, including EAGLE, NFC, MCODE, DPClus, IPCA, CPM, MCL, CMC, SPICi, and Core-Attachment. Experimental results show that EPOF outperforms other previous competing algorithms in terms of matching with known complexes, sensitivity, specificity, f-measure, function enrichment and accuracy. The program and related files available on https://github.com/gangchen/epof.
高通量实验技术和计算预测推动了许多生物体的大规模相互作用组的出现。从这些相互作用网络中鉴定蛋白质复合物对于理解细胞组织的原则和预测蛋白质功能至关重要。蛋白质复合物通常被认为是密集的子图。然而,真正的蛋白质复合物并不总是具有高度连接的拓扑结构。在本文中,提出了一种新的蛋白质复合物识别方法,称为 EPOF,该方法使用必需蛋白和顶点适应度的局部度量。在 EPOF 中,首先考虑由必需蛋白组成的子网中的团作为种子,根据它们的大小和邻居的数量对它们进行排序。根据邻居适应度值的评估,从种子扩展蛋白质复合物。然后,将类似的过程应用于由第一步未聚类的蛋白组成的子网中鉴定的团。当 EPOF 通过扩展必需蛋白团来识别复合物时,必需蛋白具有更高的优先级和更低的阈值。当通过扩展非必需蛋白团来识别复合物时,非必需蛋白具有更高的优先级和更低的阈值。最后,在最后一步,我们输出识别出的复合物集。所提出的算法 EPOF 应用于 S. cerevisiae 的无权重和加权相互作用网络,并检测到许多已知的蛋白质复合物。我们将 EPOF 的性能与其他十种先前的算法进行了比较,包括 EAGLE、NFC、MCODE、DPClus、IPCA、CPM、MCL、CMC、SPICi 和 Core-Attachment。实验结果表明,EPOF 在与已知复合物匹配、灵敏度、特异性、F 值、功能富集和准确性方面优于其他先前的竞争算法。程序和相关文件可在 https://github.com/gangchen/epof 上获得。