Wang Yan, Chen Qiong, Yang Lili, Yang Sen, He Kai, Xie Xuping
Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
School of Artificial Intelligence, Jilin University, Changchun, China.
Front Genet. 2021 Jun 23;12:689515. doi: 10.3389/fgene.2021.689515. eCollection 2021.
With the rapid development of bioinformatics, researchers have applied community detection algorithms to detect functional modules in protein-protein interaction (PPI) networks that can predict the function of unknown proteins at the molecular level and further reveal the regularity of cell activity. Clusters in a PPI network may overlap where a protein is involved in multiple functional modules. To identify overlapping structures in protein functional modules, this paper proposes a novel overlapping community detection algorithm based on the neighboring local clustering coefficient (NLC). The contributions of the NLC algorithm are threefold: (i) Combine the edge-based community detection method with local expansion in seed selection and the local clustering coefficient of neighboring nodes to improve the accuracy of seed selection; (ii) A method of measuring the distance between edges is improved to make the result of community division more accurate; (iii) A community optimization strategy for the excessive overlapping nodes makes the overlapping structure more reasonable. The experimental results on standard networks, Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks and PPI networks show that the NLC algorithm can improve the Extended modularity (EQ) value and Normalized Mutual Information (NMI) value of the community division, which verifies that the algorithm can not only detect reasonable communities but also identify overlapping structures in networks.
随着生物信息学的快速发展,研究人员已应用社区检测算法来检测蛋白质-蛋白质相互作用(PPI)网络中的功能模块,这些功能模块可以在分子水平上预测未知蛋白质的功能,并进一步揭示细胞活动的规律。PPI网络中的簇可能会重叠,因为一个蛋白质可能参与多个功能模块。为了识别蛋白质功能模块中的重叠结构,本文提出了一种基于邻域局部聚类系数(NLC)的新型重叠社区检测算法。NLC算法的贡献主要有三点:(i)将基于边的社区检测方法与种子选择中的局部扩展以及相邻节点的局部聚类系数相结合,提高种子选择的准确性;(ii)改进了一种测量边之间距离的方法,使社区划分结果更准确;(iii)针对过度重叠节点的社区优化策略,使重叠结构更合理。在标准网络、Lancichinetti-Fortunato-Radicchi(LFR)基准网络和PPI网络上的实验结果表明,NLC算法可以提高社区划分的扩展模块度(EQ)值和归一化互信息(NMI)值,这验证了该算法不仅可以检测出合理的社区,还能识别网络中的重叠结构。