Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg, Esch-sur-Alzette, Luxembourg.
Computer Science Division, Aeronautics Institute of Technology (ITA), São Josédos Campos, Brazil.
PLoS One. 2022 Jan 27;17(1):e0260484. doi: 10.1371/journal.pone.0260484. eCollection 2022.
Identifying protein complexes in protein-protein interaction (ppi) networks is often handled as a community detection problem, with algorithms generally relying exclusively on the network topology for discovering a solution. The advancement of experimental techniques on ppi has motivated the generation of many Gene Ontology (go) databases. Incorporating the functionality extracted from go with the topological properties from the underlying ppi network yield a novel approach to identify protein complexes. Additionally, most of the existing algorithms use global measures that operate on the entire network to identify communities. The result of using global metrics are large communities that are often not correlated with the functionality of the proteins. Moreover, ppi network analysis shows that most of the biological functions possibly lie between local neighbours in ppi networks, which are not identifiable with global metrics. In this paper, we propose a local community detection algorithm, (lcda-go), that uniquely exploits information of functionality from go combined with the network topology. lcda-go identifies the community of each protein based on the topological and functional knowledge acquired solely from the local neighbour proteins within the ppi network. Experimental results using the Krogan dataset demonstrate that our algorithm outperforms in most cases state-of-the-art approaches in assessment based on Precision, Sensitivity, and particularly Composite Score. We also deployed lcda, the local-topology based precursor of lcda-go, to compare with a similar state-of-the-art approach that exclusively incorporates topological information of ppi networks for community detection. In addition to the high quality of the results, one main advantage of lcda-go is its low computational time complexity.
在蛋白质-蛋白质相互作用 (ppi) 网络中识别蛋白质复合物通常被视为社区检测问题,算法通常仅依赖于网络拓扑结构来发现解决方案。ppi 实验技术的进步促使产生了许多基因本体 (GO) 数据库。将从 GO 中提取的功能与基础 ppi 网络的拓扑性质相结合,为识别蛋白质复合物提供了一种新方法。此外,大多数现有的算法使用全局度量标准来识别社区,这些度量标准作用于整个网络。使用全局指标的结果是大型社区,这些社区通常与蛋白质的功能不相关。此外,ppi 网络分析表明,大多数生物功能可能存在于 ppi 网络的局部邻居之间,而全局指标无法识别这些功能。在本文中,我们提出了一种局部社区检测算法 (lcda-go),该算法独特地利用了来自 GO 的功能信息与网络拓扑结构相结合。lcda-go 根据仅从 ppi 网络中的局部邻居蛋白质中获取的拓扑和功能知识,确定每个蛋白质的社区。使用 Krogan 数据集进行的实验结果表明,在基于精度、灵敏度的评估方面,我们的算法在大多数情况下都优于最先进的方法,特别是综合评分。我们还部署了 lcda,即 lcda-go 的基于局部拓扑的前身,与专门用于社区检测的类似最先进的方法进行比较。除了结果质量高之外,lcda-go 的一个主要优势是其低计算时间复杂度。