Gates Kathleen M, Henry Teague, Steinley Doug, Fair Damien A
Department of Psychology, University of North Carolina Chapel Hill, NC, USA.
Department of Psychological Sciences, University of Missouri Columbia, MO, USA.
Front Neuroinform. 2016 Nov 10;10:45. doi: 10.3389/fninf.2016.00045. eCollection 2016.
The past decade has been marked with a proliferation of community detection algorithms that aim to organize nodes (e.g., individuals, brain regions, variables) into modular structures that indicate subgroups, clusters, or communities. Motivated by the emergence of big data across many fields of inquiry, these methodological developments have primarily focused on the detection of communities of nodes from matrices that are very large. However, it remains unknown if the algorithms can reliably detect communities in smaller graph sizes (i.e., 1000 nodes and fewer) which are commonly used in brain research. More importantly, these algorithms have predominantly been tested only on binary or sparse count matrices and it remains unclear the degree to which the algorithms can recover community structure for different types of matrices, such as the often used cross-correlation matrices representing functional connectivity across predefined brain regions. Of the publicly available approaches for weighted graphs that can detect communities in graph sizes of at least 1000, prior research has demonstrated that Newman's spectral approach (i.e., Leading Eigenvalue), Walktrap, Fast Modularity, the Louvain method (i.e., multilevel community method), Label Propagation, and Infomap all recover communities exceptionally well in certain circumstances. The purpose of the present Monte Carlo simulation study is to test these methods across a large number of conditions, including varied graph sizes and types of matrix (sparse count, correlation, and reflected Euclidean distance), to identify which algorithm is optimal for specific types of data matrices. The results indicate that when the data are in the form of sparse count networks (such as those seen in diffusion tensor imaging), Label Propagation and Walktrap surfaced as the most reliable methods for community detection. For dense, weighted networks such as correlation matrices capturing functional connectivity, Walktrap consistently outperformed the other approaches for recovering communities.
在过去十年中,社区检测算法大量涌现,这些算法旨在将节点(例如个体、脑区、变量)组织成模块化结构,以指示子群体、簇或社区。受众多研究领域大数据出现的推动,这些方法的发展主要集中于从非常大的矩阵中检测节点社区。然而,对于这些算法能否在脑研究中常用的较小图规模(即1000个节点及以下)中可靠地检测社区,目前仍不清楚。更重要的是,这些算法主要仅在二元或稀疏计数矩阵上进行了测试,尚不清楚这些算法能够在多大程度上恢复不同类型矩阵(例如常用于表示预定义脑区之间功能连接的互相关矩阵)的社区结构。在可公开获取的、能够检测至少1000规模图中社区的加权图方法中,先前的研究表明,纽曼谱方法(即主特征值法)、随机游走算法、快速模块度算法、鲁汶方法(即多层社区方法)、标签传播算法和信息地图算法在某些情况下都能很好地恢复社区结构。本蒙特卡洛模拟研究的目的是在大量条件下测试这些方法,包括不同的图规模和矩阵类型(稀疏计数、相关性和反射欧几里得距离),以确定哪种算法对于特定类型的数据矩阵是最优的。结果表明,当数据为稀疏计数网络形式(如扩散张量成像中所见)时,标签传播算法和随机游走算法成为社区检测最可靠的方法。对于诸如捕获功能连接的相关矩阵这样的密集加权网络,随机游走算法在恢复社区方面始终优于其他方法。