Good Benjamin H, de Montjoye Yves-Alexandre, Clauset Aaron
Department of Physics, Swarthmore College, Swarthmore, Pennsylvania 19081, USA and Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA.
Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Apr;81(4 Pt 2):046106. doi: 10.1103/PhysRevE.81.046106. Epub 2010 Apr 15.
Although widely used in practice, the behavior and accuracy of the popular module identification technique called modularity maximization is not well understood in practical contexts. Here, we present a broad characterization of its performance in such situations. First, we revisit and clarify the resolution limit phenomenon for modularity maximization. Second, we show that the modularity function Q exhibits extreme degeneracies: it typically admits an exponential number of distinct high-scoring solutions and typically lacks a clear global maximum. Third, we derive the limiting behavior of the maximum modularity Qmax for one model of infinitely modular networks, showing that it depends strongly both on the size of the network and on the number of modules it contains. Finally, using three real-world metabolic networks as examples, we show that the degenerate solutions can fundamentally disagree on many, but not all, partition properties such as the composition of the largest modules and the distribution of module sizes. These results imply that the output of any modularity maximization procedure should be interpreted cautiously in scientific contexts. They also explain why many heuristics are often successful at finding high-scoring partitions in practice and why different heuristics can disagree on the modular structure of the same network. We conclude by discussing avenues for mitigating some of these behaviors, such as combining information from many degenerate solutions or using generative models.
尽管模块化最大化这一流行的模块识别技术在实践中被广泛应用,但其在实际情况下的行为和准确性尚未得到很好的理解。在此,我们对其在这类情况下的性能进行了全面的描述。首先,我们重新审视并阐明了模块化最大化的分辨率极限现象。其次,我们表明模块化函数Q表现出极端的简并性:它通常允许指数数量的不同高分解决方案,并且通常缺乏明确的全局最大值。第三,我们推导了无限模块化网络的一个模型中最大模块化Qmax的极限行为,表明它强烈依赖于网络的大小及其包含的模块数量。最后,以三个真实世界的代谢网络为例,我们表明简并解在许多但并非所有的划分属性上可能存在根本分歧,比如最大模块的组成和模块大小的分布。这些结果意味着在科学背景下,任何模块化最大化程序的输出都应谨慎解释。它们还解释了为什么许多启发式方法在实践中常常能成功找到高分划分,以及为什么不同的启发式方法可能在同一网络的模块化结构上存在分歧。我们通过讨论缓解其中一些行为的途径来结束本文,比如结合来自许多简并解的信息或使用生成模型。
Phys Rev E Stat Nonlin Soft Matter Phys. 2010-4
Phys Rev E Stat Nonlin Soft Matter Phys. 2012-6
Neural Comput. 2014-12
Proc Natl Acad Sci U S A. 2007-1-2
Phys Rev E Stat Nonlin Soft Matter Phys. 2014-11
Behav Res Methods. 2023-10
Phys Rev E Stat Nonlin Soft Matter Phys. 2015-1
Proc Natl Acad Sci U S A. 2014-12-23
Phys Rev E Stat Nonlin Soft Matter Phys. 2011-5
Neurobiol Lang (Camb). 2025-9-2
J Phys Complex. 2025-3-1
bioRxiv. 2025-7-10
PLoS One. 2025-7-2
Biol Psychiatry Glob Open Sci. 2025-4-30