IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):777-787. doi: 10.1109/TCBB.2019.2897769. Epub 2019 Feb 6.
Identifying protein complexes is helpful for understanding cellular functions and designing drugs. In the last decades, many computational methods have been proposed based on detecting dense subgraphs or subnetworks in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents from the achievement of satisfactory detection results directly from PINs, because most of such existing methods exploit mainly topological information to do network partitioning. In this paper, we propose a new approach for protein complex detection by merging topological information of PINs and functional information of proteins. We first split proteins to a number of protein groups from the perspective of protein functions by using FunCat data. Then, for each of the resulting protein groups, we calculate two protein-protein similarity matrices: one is computed by using graph embedding over a PIN, the other is by using GO terms, and combine these two matrices to get an integrated similarity matrix. Following that, we cluster the proteins in each group based on the corresponding integrated similarity matrix, and obtain a number of small protein clusters. We map these clusters of proteins onto the PIN, and get a number of connected subgraphs. After a round of merging of overlapping subgraphs, finally we get the detected complexes. We conduct empirical evaluation on four PPI datasets (Collins, Gavin, Krogan, and Wiphi) with two complex benchmarks (CYC2008 and MIPS). Experimental results show that our method performs better than the state-of-the-art methods.
鉴定蛋白质复合物有助于理解细胞功能和设计药物。在过去的几十年中,已经提出了许多基于检测蛋白质-蛋白质相互作用网络 (PINs) 中密集子图或子网的计算方法。然而,PINs 中高错误率的正/负相互作用阻止了直接从 PINs 获得令人满意的检测结果,因为大多数现有方法主要利用拓扑信息来进行网络分区。在本文中,我们提出了一种通过合并 PIN 的拓扑信息和蛋白质的功能信息来检测蛋白质复合物的新方法。我们首先使用 FunCat 数据从蛋白质功能的角度将蛋白质划分为若干蛋白质组。然后,对于每个得到的蛋白质组,我们计算两个蛋白质-蛋白质相似性矩阵:一个是通过在 PIN 上进行图嵌入计算的,另一个是通过 GO 术语计算的,并将这两个矩阵结合起来得到一个综合相似性矩阵。接下来,我们根据相应的综合相似性矩阵对每个组中的蛋白质进行聚类,并获得一些小的蛋白质簇。我们将这些蛋白质簇映射到 PIN 上,并得到一些连通的子图。经过一轮重叠子图的合并,最终我们得到了检测到的复合物。我们在四个 PPI 数据集 (Collins、Gavin、Krogan 和 Wiphi) 上进行了实验评估,并使用两个复合物基准 (CYC2008 和 MIPS)。实验结果表明,我们的方法比最新方法表现更好。