IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):892-904. doi: 10.1109/TCBB.2016.2642107. Epub 2016 Dec 20.
This paper presents a graph clustering algorithm, called EGCPI, to discover protein complexes in protein-protein interaction (PPI) networks. In performing its task, EGCPI takes into consideration both network topologies and attributes of interacting proteins, both of which have been shown to be important for protein complex discovery. EGCPI formulates the problem as an optimization problem and tackles it with evolutionary clustering. Given a PPI network, EGCPI first annotates each protein with corresponding attributes that are provided in Gene Ontology database. It then adopts a similarity measure to evaluate how similar the connected proteins are taking into consideration the network topology. Given this measure, EGCPI then discovers a number of graph clusters within which proteins are densely connected, based on an evolutionary strategy. At last, EGCPI identifies protein complexes in each discovered cluster based on the homogeneity of attributes performed by pairwise proteins. EGCPI has been tested with several real data sets and the experimental results show EGCPI is very effective on protein complex discovery, and the evolutionary clustering is helpful to identify protein complexes in PPI networks. The software of EGCPI can be downloaded via: https://github.com/hetiantian1985/EGCPI.
本文提出了一种名为 EGCPI 的图聚类算法,用于发现蛋白质 - 蛋白质相互作用(PPI)网络中的蛋白质复合物。在执行任务时,EGCPI 同时考虑了网络拓扑结构和相互作用蛋白质的属性,这两者都被证明对蛋白质复合物发现很重要。EGCPI 将问题表述为优化问题,并采用进化聚类来解决它。给定一个 PPI 网络,EGCPI 首先使用基因本体数据库中提供的相应属性注释每个蛋白质。然后,它采用相似性度量来评估考虑网络拓扑结构时连接蛋白质的相似程度。有了这个度量标准,EGCPI 然后根据进化策略在蛋白质密集连接的多个图聚类中发现。最后,EGCPI 根据成对蛋白质执行的属性同质性在每个发现的聚类中识别蛋白质复合物。EGCPI 已经在几个真实数据集上进行了测试,实验结果表明 EGCPI 在蛋白质复合物发现方面非常有效,进化聚类有助于识别 PPI 网络中的蛋白质复合物。EGCPI 的软件可以通过以下网址下载:https://github.com/hetiantian1985/EGCPI。