Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China.
Molecules. 2017 Dec 8;22(12):2179. doi: 10.3390/molecules22122179.
Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present a new nodal metric by integrating its local topological information. The metric reflects its representability in a larger local neighborhood to a cluster of a protein interaction (PPI) network. Based on the metric, we propose a seed-expansion graph clustering algorithm (SEGC) for protein complexes detection in PPI networks. A roulette wheel strategy is used in the selection of the seed to enhance the diversity of clustering. For a candidate node , we define its closeness to a cluster , denoted as (, ), by combing the density of a cluster and the connection between a node and . In SEGC, a cluster which initially consists of only a seed node, is extended by adding nodes recursively from its neighbors according to the closeness, until all neighbors fail the process of expansion. We compare the -measure and accuracy of the proposed SEGC algorithm with other algorithms on protein interaction networks. The experimental results show that SEGC outperforms other algorithms under full coverage.
大多数蛋白质在作为复合物相互作用时执行其生物功能。检测蛋白质复合物不仅对于理解生物网络的功能和结构之间的关系很重要,而且对于预测未知蛋白质的功能也很重要。我们通过整合其局部拓扑信息提出了一种新的节点度量标准。该度量标准反映了它在蛋白质相互作用网络(PPI)中聚类的更大局部邻域中的表示能力。基于该度量标准,我们提出了一种基于种子扩展图聚类算法(SEGC)用于 PPI 网络中的蛋白质复合物检测。在选择种子时使用轮盘赌策略来增强聚类的多样性。对于候选节点 ,我们通过组合一个聚类的密度 和节点 与聚类的连接 来定义它与聚类的接近程度,记为 (, )。在 SEGC 中,一个最初仅由一个种子节点组成的聚类,通过根据接近程度从其邻居中递归地添加节点来扩展,直到所有邻居都无法扩展为止。我们在蛋白质相互作用网络上比较了 -度量和所提出的 SEGC 算法的准确性与其他算法的性能。实验结果表明,在完全覆盖的情况下,SEGC 优于其他算法。