Narayanan Tejaswini, Gersten Merril, Subramaniam Shankar, Grama Ananth
Department of Bioengineering, University of California, San Diego, USA.
BMC Res Notes. 2011 Dec 29;4:569. doi: 10.1186/1756-0500-4-569.
Many recent studies have investigated modularity in biological networks, and its role in functional and structural characterization of constituent biomolecules. A technique that has shown considerable promise in the domain of modularity detection is the Newman and Girvan (NG) algorithm, which relies on the number of shortest-paths across pairs of vertices in the network traversing a given edge, referred to as the betweenness of that edge. The edge with the highest betweenness is iteratively eliminated from the network, with the betweenness of the remaining edges recalculated in every iteration. This generates a complete dendrogram, from which modules are extracted by applying a quality metric called modularity denoted by Q. This exhaustive computation can be prohibitively expensive for large networks such as Protein-Protein Interaction Networks. In this paper, we present a novel optimization to the modularity detection algorithm, in terms of an efficient termination criterion based on a target edge betweenness value, using which the process of iterative edge removal may be terminated.
We validate the robustness of our approach by applying our algorithm on real-world protein-protein interaction networks of Yeast, C.Elegans and Drosophila, and demonstrate that our algorithm consistently has significant computational gains in terms of reduced runtime, when compared to the NG algorithm. Furthermore, our algorithm produces modules comparable to those from the NG algorithm, qualitatively and quantitatively. We illustrate this using comparison metrics such as module distribution, module membership cardinality, modularity Q, and Jaccard Similarity Coefficient.
We have presented an optimized approach for efficient modularity detection in networks. The intuition driving our approach is the extraction of holistic measures of centrality from graphs, which are representative of inherent modular structure of the underlying network, and the application of those measures to efficiently guide the modularity detection process. We have empirically evaluated our approach in the specific context of real-world large scale biological networks, and have demonstrated significant savings in computational time while maintaining comparable quality of detected modules.
最近许多研究探讨了生物网络中的模块化及其在组成生物分子的功能和结构表征中的作用。在模块化检测领域显示出巨大潜力的一种技术是纽曼和吉尔万(NG)算法,该算法依赖于网络中穿过给定边的顶点对之间的最短路径数量,即该边的介数。具有最高介数的边被迭代地从网络中移除,每次迭代都会重新计算剩余边的介数。这会生成一个完整的树形图,通过应用一个称为模块化的质量度量Q从中提取模块。对于诸如蛋白质 - 蛋白质相互作用网络这样的大型网络,这种详尽的计算成本可能过高。在本文中,我们提出了一种对模块化检测算法的新颖优化,即基于目标边介数值的有效终止标准,利用该标准可以终止迭代边移除过程。
我们通过将算法应用于酵母、秀丽隐杆线虫和果蝇的真实蛋白质 - 蛋白质相互作用网络来验证我们方法的稳健性,并证明与NG算法相比,我们的算法在运行时间减少方面始终具有显著的计算优势。此外,我们的算法在定性和定量方面产生的模块与NG算法的模块相当。我们使用模块分布、模块成员基数、模块化Q和杰卡德相似系数等比较指标来说明这一点。
我们提出了一种用于网络中高效模块化检测的优化方法。推动我们方法的直觉是从图中提取整体中心性度量,这些度量代表基础网络的固有模块化结构,并应用这些度量来有效地指导模块化检测过程。我们在真实世界大规模生物网络的特定背景下对我们的方法进行了实证评估,并证明在保持检测模块质量相当的同时,计算时间有显著节省。