Leung Ian X Y, Hui Pan, Liò Pietro, Crowcroft Jon
University of Cambridge, Cambridge CB3 0FD, UK.
Phys Rev E Stat Nonlin Soft Matter Phys. 2009 Jun;79(6 Pt 2):066107. doi: 10.1103/PhysRevE.79.066107. Epub 2009 Jun 16.
The recent boom of large-scale online social networks (OSNs) both enables and necessitates the use of parallelizable and scalable computational techniques for their analysis. We examine the problem of real-time community detection and a recently proposed linear time- O(m) on a network with m edges-label propagation, or "epidemic" community detection algorithm. We identify characteristics and drawbacks of the algorithm and extend it by incorporating different heuristics to facilitate reliable and multifunctional real-time community detection. With limited computational resources, we employ the algorithm on OSN data with 1 x 10(6) nodes and about 58 x 10(6) directed edges. Experiments and benchmarks reveal that the extended algorithm is not only faster but its community detection accuracy compares favorably over popular modularity-gain optimization algorithms known to suffer from their resolution limits.
近期大规模在线社交网络(OSN)的蓬勃发展使得对其进行分析时,采用可并行化和可扩展的计算技术成为可能且必要。我们研究了实时社区检测问题,以及一种最近提出的在具有m条边的网络上的线性时间O(m)——标签传播或“流行”社区检测算法。我们确定了该算法的特点和缺点,并通过纳入不同的启发式方法对其进行扩展,以促进可靠且多功能的实时社区检测。在计算资源有限的情况下,我们将该算法应用于具有1×10⁶个节点和约58×10⁶条有向边的OSN数据。实验和基准测试表明,扩展后的算法不仅速度更快,而且其社区检测准确率优于已知受分辨率限制的流行模块化增益优化算法。