College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
BMC Bioinformatics. 2018 Aug 22;19(1):305. doi: 10.1186/s12859-018-2309-9.
In recent decades, detecting protein complexes (PCs) from protein-protein interaction networks (PPINs) has been an active area of research. There are a large number of excellent graph clustering methods that work very well for identifying PCs. However, most of existing methods usually overlook the inherent core-attachment organization of PCs. Therefore, these methods have three major limitations we should concern. Firstly, many methods have ignored the importance of selecting seed, especially without considering the impact of overlapping nodes as seed nodes. Thus, there may be false predictions. Secondly, PCs are generally supposed to be dense subgraphs. However, the subgraphs with high local modularity structure usually correspond to PCs. Thirdly, a number of available methods lack handling noise mechanism, and miss some peripheral proteins. In summary, all these challenging issues are very important for predicting more biological overlapping PCs.
In this paper, to overcome these weaknesses, we propose a clustering method by core-attachment and local modularity structure, named CALM, to detect overlapping PCs from weighted PPINs with noises. Firstly, we identify overlapping nodes and seed nodes. Secondly, for a node, we calculate the support function between a node and a cluster. In CALM, a cluster which initially consists of only a seed node, is extended by adding its direct neighboring nodes recursively according to the support function, until this cluster forms a locally optimal modularity subgraph. Thirdly, we repeat this process for the remaining seed nodes. Finally, merging and removing procedures are carried out to obtain final predicted clusters. The experimental results show that CALM outperforms other classical methods, and achieves ideal overall performance. Furthermore, CALM can match more complexes with a higher accuracy and provide a better one-to-one mapping with reference complexes in all test datasets. Additionally, CALM is robust against the high rate of noise PPIN.
By considering core-attachment and local modularity structure, CALM could detect PCs much more effectively than some representative methods. In short, CALM could potentially identify previous undiscovered overlapping PCs with various density and high modularity.
近几十年来,从蛋白质-蛋白质相互作用网络(PPIN)中检测蛋白质复合物(PC)一直是一个活跃的研究领域。有大量优秀的图聚类方法非常适用于识别 PC。然而,大多数现有的方法通常忽略了 PC 的固有核心附着组织。因此,这些方法有三个我们应该关注的主要局限性。首先,许多方法忽略了选择种子的重要性,特别是没有考虑将重叠节点作为种子节点的影响。因此,可能会出现错误的预测。其次,PC 通常被认为是密集的子图。然而,具有高局部模块结构的子图通常对应于 PC。第三,许多可用的方法缺乏处理噪声的机制,并且会遗漏一些外围蛋白。总之,所有这些具有挑战性的问题对于预测更多的生物学重叠 PC 都非常重要。
在本文中,为了克服这些弱点,我们提出了一种基于核心附着和局部模块结构的聚类方法,称为 CALM,用于从带噪声的加权 PPIN 中检测重叠 PC。首先,我们识别重叠节点和种子节点。其次,对于一个节点,我们计算节点与一个簇之间的支持函数。在 CALM 中,一个最初只包含一个种子节点的簇,根据支持函数递归地添加其直接相邻节点,直到这个簇形成一个局部最优的模块子图。第三,我们对剩余的种子节点重复这个过程。最后,通过合并和删除过程获得最终预测的簇。实验结果表明,CALM 优于其他经典方法,在所有测试数据集上均能达到理想的整体性能。此外,CALM 可以以更高的精度匹配更多的复合物,并提供与参考复合物更好的一对一映射。此外,CALM 对高噪声 PPIN 率具有鲁棒性。
通过考虑核心附着和局部模块结构,CALM 可以比一些有代表性的方法更有效地检测 PC。总之,CALM 可以有效地识别以前未发现的具有各种密度和高模块性的重叠 PC。