Bagrow James P
Department of Engineering Sciences and Applied Mathematics, Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois 60208, USA.
Phys Rev E Stat Nonlin Soft Matter Phys. 2012 Jun;85(6 Pt 2):066118. doi: 10.1103/PhysRevE.85.066118. Epub 2012 Jun 15.
Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objective function known as modularity, used both to discover communities and to measure their strength. To understand the modular structure of networks it is then crucial to know how such functions evaluate different topologies, what features they account for, and what implicit assumptions they may make. We show that trees and treelike networks can have unexpectedly and often arbitrarily high values of modularity. This is surprising since trees are maximally sparse connected graphs and are not typically considered to possess modular structure, yet the nonlocal null model used by modularity assigns low probabilities, and thus high significance, to the densities of these sparse tree communities. We further study the practical performance of popular methods on model trees and on a genealogical data set and find that the discovered communities also have very high modularity, often approaching its maximum value. Statistical tests reveal the communities in trees to be significant, in contrast with known results for partitions of sparse, random graphs.
人们在理解复杂网络的模块化性质方面付出了诸多努力。社区,也被称为聚类或模块,通常被认为是节点的紧密互连组,它们与网络中的其他组仅存在稀疏连接。在许多领域中,发现高质量的社区是一个困难而重要的问题。最流行的方法是一种称为模块度的目标函数,它既用于发现社区,也用于衡量社区的强度。为了理解网络的模块化结构,关键在于了解此类函数如何评估不同的拓扑结构、它们考虑了哪些特征以及可能做出了哪些隐含假设。我们表明,树状网络和类树状网络可能具有出乎意料且往往任意高的模块度值。这令人惊讶,因为树是最大稀疏连接图,通常不被认为具有模块化结构,然而模块度所使用的非局部空模型为这些稀疏树社区的密度赋予了低概率,从而赋予了高显著性。我们进一步研究了流行方法在模型树和一个谱系数据集上的实际性能,发现所发现的社区也具有非常高的模块度,常常接近其最大值。统计检验表明,与稀疏随机图分区的已知结果相反,树中的社区是显著的。