Nakuci Johan, Rahnev Dobromir
U.S. Army DEVCOM Army Research Laboratory, Aberdeen, Maryland, USA.
School of Psychology, Georgia Institute of Technology, Atlanta, Georgia, USA.
Hum Brain Mapp. 2025 Sep;46(13):e70330. doi: 10.1002/hbm.70330.
Clustering algorithms are essential tools in data-driven research, enabling the discovery of hidden structures in complex datasets. In neuroimaging, data-driven research and clustering have been instrumental in identifying and unraveling hidden relationships. However, there are concerns associated with exploratory techniques in that they can provide erroneous results unless properly verified. Here we address this issue by examining three widely used approaches: K-means, community detection via modularity maximization, and hierarchical clustering. We first highlight their methodologies, applications, and limitations. We then discuss the critical steps for rigorous validation strategies. We further show how to apply these steps using both synthetic and real data, and provide code to facilitate their application. By contextualizing clustering within robust methodological frameworks, we demonstrate the potential of clustering-based analyses to reveal meaningful patterns and provide practical guidelines for their application in neuroscience and related fields. Clustering, when appropriately applied, is a powerful and indispensable computational method.
聚类算法是数据驱动研究中的重要工具,能够在复杂数据集中发现隐藏结构。在神经成像领域,数据驱动研究和聚类对于识别和揭示隐藏关系起到了重要作用。然而,探索性技术存在一些问题,即除非经过适当验证,否则可能会提供错误结果。在此,我们通过研究三种广泛使用的方法来解决这个问题:K均值算法、通过模块度最大化进行社区检测以及层次聚类。我们首先强调它们的方法、应用和局限性。然后讨论严格验证策略的关键步骤。我们进一步展示如何使用合成数据和真实数据应用这些步骤,并提供代码以方便其应用。通过将聚类置于稳健的方法框架中,我们展示了基于聚类的分析揭示有意义模式的潜力,并为其在神经科学及相关领域的应用提供实用指南。聚类在适当应用时是一种强大且不可或缺的计算方法。