Ma Xiuli, Zhou Guangyu, Shang Jingbo, Wang Jingjing, Peng Jian, Han Jiawei
1 Key Laboratory of Machine Perception (MOE), School of EECS, Peking University , Beijing, China .
2 Department of Computer Science, University of California , Los Angeles, California.
J Comput Biol. 2017 Sep;24(9):923-941. doi: 10.1089/cmb.2017.0037. Epub 2017 Jun 1.
Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interaction patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform a job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. However, most existing graph clustering algorithms on PPI networks often cannot effectively detect densely connected subgraphs and overlapped subgraphs. In this article, we formulate the problem of complex detection as diversified dense subgraph mining and introduce a novel approximation algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that instead of enumerating all dense subgraphs, we only need to find a small diverse subset of subgraphs that cover as many proteins as possible. The problem is modeled as finding a diverse set of maximal dense subgraphs where we develop highly effective pruning techniques to guarantee efficiency. To scale up to large networks, we devise a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several yeast and human PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.
蛋白质-蛋白质相互作用(PPI)网络提供了蛋白质相互作用模式的全面图景,使我们能够在多个分辨率下探索生物过程和细胞成分。对于一个生物过程,许多蛋白质需要协同工作来完成一项任务。蛋白质之间紧密相互作用,形成大分子机器或细胞构建模块。从PPI网络中识别出这些紧密相连的簇或蛋白质复合物,能够使我们更好地理解生物过程和细胞成分的层次结构与组织方式。然而,大多数现有的关于PPI网络的图聚类算法往往无法有效地检测到紧密连接的子图和重叠子图。在本文中,我们将复杂检测问题表述为多样化的密集子图挖掘,并引入一种新颖的近似算法,以有效地从生物网络中枚举假定的蛋白质复合物。我们算法的关键见解在于,不是枚举所有的密集子图,而是只需要找到一个小的多样化子图子集,使其覆盖尽可能多的蛋白质。该问题被建模为寻找一组多样化的最大密集子图,在此过程中我们开发了高效的剪枝技术以保证效率。为了扩展到大型网络,我们设计了一种分治方法,以分布式方式加速算法。通过在几个酵母和人类PPI网络上与现有的聚类算法和基于密集子图的算法进行比较,我们证明了我们的方法能够检测到更多的假定蛋白质复合物,并取得更好的预测准确性。