School of Information Science and Engineering, Central South University, Changsha 410083, China.
BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S10. doi: 10.1186/1471-2164-11-S2-S10.
Identification of protein complexes in large interaction networks is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Each protein might be subordinate multiple protein complexes in the real protein-protein interaction networks. Identifying overlapping protein complexes from protein-protein interaction networks is a considerable research topic.
As an effective algorithm in identifying overlapping module structures, clique percolation method (CPM) has a wide range of application in social networks and biological networks. However, the recognition accuracy of algorithm CPM is lowly. Furthermore, algorithm CPM is unfit to identifying protein complexes with meso-scale when it applied in protein-protein interaction networks. In this paper, we propose a new topological model by extending the definition of k-clique community of algorithm CPM and introduced distance restriction, and develop a novel algorithm called CP-DR based on the new topological model for identifying protein complexes. In this new algorithm, the protein complex size is restricted by distance constraint to conquer the shortcomings of algorithm CPM. The algorithm CP-DR is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes.
The proposed algorithm CP-DR based on clique percolation and distance restriction makes it possible to identify dense subgraphs in protein interaction networks, a large number of which correspond to known protein complexes. Compared to algorithm CPM, algorithm CP-DR has more outstanding performance.
在大型相互作用网络中鉴定蛋白质复合物对于理解细胞组织的原理和预测蛋白质功能至关重要,这是后基因组时代最重要的问题之一。在真实的蛋白质-蛋白质相互作用网络中,每个蛋白质可能隶属于多个蛋白质复合物。从蛋白质-蛋白质相互作用网络中鉴定重叠的蛋白质复合物是一个重要的研究课题。
作为一种识别重叠模块结构的有效算法,团渗滤法(CPM)在社交网络和生物网络中得到了广泛的应用。然而,算法 CPM 的识别精度较低。此外,当应用于蛋白质-蛋白质相互作用网络时,算法 CPM 不适合识别中尺度的蛋白质复合物。在本文中,我们通过扩展算法 CPM 的 k-团社区定义并引入距离限制,提出了一种新的拓扑模型,并基于该新拓扑模型开发了一种新的算法,称为 CP-DR,用于识别蛋白质复合物。在这个新算法中,通过距离约束来限制蛋白质复合物的大小,克服了算法 CPM 的缺点。该算法 CP-DR 被应用于酿酒酵母的蛋白质相互作用网络,并鉴定出了许多已知的复合物。
基于团渗滤和距离限制的提出算法 CP-DR 使得在蛋白质相互作用网络中识别密集子图成为可能,其中许多子图对应于已知的蛋白质复合物。与算法 CPM 相比,算法 CP-DR 具有更好的性能。