Silva Filipi N, Albeshri Aiiad, Thayananthan Vijey, Alhalabi Wadee, Fortunato Santo
Indiana University Network Science Institute (IUNI), Bloomington, Indiana, 47408, USA.
Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah 21589, Kingdom of Saudi Arabia.
Phys Rev E. 2022 May;105(5-1):054308. doi: 10.1103/PhysRevE.105.054308.
A basic question in network community detection is how modular a given network is. This is usually addressed by evaluating the quality of partitions detected in the network. The Girvan-Newman (GN) modularity function is the standard way to make this assessment, but it has a number of drawbacks. Most importantly, it is not clearly interpretable, given that the measure can take relatively large values on partitions of random networks without communities. Here we propose a measure based on the concept of robustness: modularity is the probability to find trivial partitions when the structure of the network is randomly perturbed. This concept can be implemented for any clustering algorithm capable of telling when a group structure is absent. Tests on artificial and real graphs reveal that robustness modularity can be used to assess and compare the strength of the community structure of different networks. We also introduce two other quality functions: modularity difference, a suitably normalized version of the GN modularity, and information modularity, a measure of distance based on information compression. Both measures are strongly correlated with robustness modularity, but have lower time complexity, so they could be used on networks whose size makes the calculation of robustness modularity too costly.
网络社区检测中的一个基本问题是给定网络的模块化程度如何。这通常通过评估在网络中检测到的分区质量来解决。Girvan - Newman(GN)模块化函数是进行此评估的标准方法,但它有许多缺点。最重要的是,它不太容易解释,因为该度量在没有社区的随机网络分区上可能会取相对较大的值。在此,我们提出一种基于稳健性概念的度量:模块化是当网络结构受到随机扰动时找到平凡分区的概率。这个概念可以应用于任何能够判断何时不存在群组结构的聚类算法。对人工和真实图的测试表明,稳健性模块化可用于评估和比较不同网络社区结构的强度。我们还引入了另外两个质量函数:模块化差异,它是GN模块化的适当归一化版本;以及信息模块化,一种基于信息压缩的距离度量。这两种度量都与稳健性模块化高度相关,但时间复杂度较低,因此可用于那些网络规模使得计算稳健性模块化成本过高的网络。