Bettinelli Andrea, Hansen Pierre, Liberti Leo
Dipartimento di Tecnologie dell'Infomazione, Università degli Studi di Milano, via Bramante 65, Crema, Italy.
Phys Rev E Stat Nonlin Soft Matter Phys. 2012 Jul;86(1 Pt 2):016107. doi: 10.1103/PhysRevE.86.016107. Epub 2012 Jul 13.
Modularity maximization is extensively used to detect communities in complex networks. It has been shown, however, that this method suffers from a resolution limit: Small communities may be undetectable in the presence of larger ones even if they are very dense. To alleviate this defect, various modifications of the modularity function have been proposed as well as multiresolution methods. In this paper we systematically study a simple model (proposed by Pons and Latapy [Theor. Comput. Sci. 412, 892 (2011)] and similar to the parametric model of Reichardt and Bornholdt [Phys. Rev. E 74, 016110 (2006)]) with a single parameter α that balances the fraction of within community edges and the expected fraction of edges according to the configuration model. An exact algorithm is proposed to find optimal solutions for all values of α as well as the corresponding successive intervals of α values for which they are optimal. This algorithm relies upon a routine for exact modularity maximization and is limited to moderate size instances. An agglomerative hierarchical heuristic is therefore proposed to address parametric modularity detection in large networks. At each iteration the smallest value of α for which it is worthwhile to merge two communities of the current partition is found. Then merging is performed and the data are updated accordingly. An implementation is proposed with the same time and space complexity as the well-known Clauset-Newman-Moore (CNM) heuristic [Phys. Rev. E 70, 066111 (2004)]. Experimental results on artificial and real world problems show that (i) communities are detected by both exact and heuristic methods for all values of the parameter α; (ii) the dendrogram summarizing the results of the heuristic method provides a useful tool for substantive analysis, as illustrated particularly on a Les Misérables data set; (iii) the difference between the parametric modularity values given by the exact method and those given by the heuristic is moderate; (iv) the heuristic version of the proposed parametric method, viewed as a modularity maximization tool, gives better results than the CNM heuristic for large instances.
模块度最大化被广泛用于在复杂网络中检测社区。然而,已经表明,这种方法存在分辨率限制:即使小社区非常密集,在存在大社区的情况下也可能无法检测到。为了缓解这一缺陷,人们提出了模块度函数的各种修改方法以及多分辨率方法。在本文中,我们系统地研究了一个简单模型(由庞斯和拉塔皮提出[《理论计算机科学》412, 892 (2011)],类似于赖夏德特和博恩霍尔特[《物理评论E》74, 016110 (2006)]的参数模型)——该模型有一个单一参数α,它根据配置模型平衡社区内边的比例和边的预期比例。我们提出了一种精确算法来找到α所有值的最优解以及它们为最优解时相应的连续α值区间,但此算法依赖于精确模块度最大化的例程,并且仅限于中等规模的实例。因此,我们提出了一种凝聚层次启发式算法来处理大型网络中的参数化模块度检测。在每次迭代中,找到值得合并当前分区中两个社区的最小α值。然后进行合并并相应地更新数据,并给出了一种实现方法,其时间和空间复杂度与著名的克劳塞特 - 纽曼 - 摩尔(CNM)启发式算法[《物理评论E》70, 066111 (2004)]相同。针对人工和实际问题的实验结果表明:(i)对于参数α的所有值,精确方法和启发式方法都能检测到社区;(ii)总结启发式方法结果的树状图为实质性分析提供了一个有用的工具,特别是在《悲惨世界》数据集上得到了说明;(iii)精确方法给出的参数化模块度值与启发式方法给出的值之间的差异适中;(iv)作为一种模块度最大化工具,所提出的参数化方法的启发式版本对于大型实例给出的结果比CNM启发式算法更好。