Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill.
Department of Mathematics, University of North Carolina at Chapel Hill.
Psychol Methods. 2019 Dec;24(6):675-689. doi: 10.1037/met0000204. Epub 2019 Feb 11.
Psychological researchers often seek to obtain cluster solutions from sparse count matrices (e.g., social networks; counts of symptoms that are in common for 2 given individuals; structural brain imaging). Increasingly, community detection methods are being used to subset the data in a data-driven manner. While many of these approaches perform well in simulation studies and thus offer some improvement upon traditional clustering approaches, there is no readily available approach for evaluating the robustness of these solutions in empirical data. Researchers have no way of knowing if their results are due to noise. We describe here 2 approaches novel to the field of psychology that enable evaluation of cluster solution robustness. This tutorial also explains the use of an associated R package, perturbR, which provides researchers with the ability to use the methods described herein. In the first approach, the cluster assignment from the original matrix is compared against cluster assignments obtained by randomly perturbing the edges in the matrix. Stable cluster solutions should not demonstrate large changes in the presence of small perturbations. For the second approach, Monte Carlo simulations of random matrices that have the same properties as the original matrix are generated. The distribution of quality scores ("modularity") obtained from the cluster solutions from these matrices are then compared with the score obtained from the original matrix results. From this, one can assess if the results are better than what would be expected by chance. perturbR automates these 2 methods, providing an easy-to-use resource for psychological researchers. We demonstrate the utility of this package using benchmark simulated data generated from a previous study and then apply the methods to publicly available empirical data obtained from social networks and structural neuroimaging. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
心理研究人员通常试图从稀疏计数矩阵中获得聚类解决方案(例如,社交网络;两个特定个体共有的症状计数;结构脑成像)。越来越多的社区检测方法正被用于以数据驱动的方式对数据进行子集化。虽然这些方法中的许多在模拟研究中表现良好,从而在传统聚类方法上提供了一些改进,但在经验数据中没有现成的方法来评估这些解决方案的稳健性。研究人员无法知道他们的结果是否是由于噪声造成的。我们在这里描述了心理学领域的 2 种新颖方法,这些方法能够评估聚类解决方案的稳健性。本教程还解释了关联的 R 包 perturbR 的使用,该包为研究人员提供了使用本文所述方法的能力。在第一种方法中,将原始矩阵的聚类分配与通过随机扰乱矩阵中的边缘获得的聚类分配进行比较。在存在小扰动的情况下,稳定的聚类解决方案不应表现出大的变化。对于第二种方法,生成与原始矩阵具有相同属性的随机矩阵的蒙特卡罗模拟。然后将从这些矩阵的聚类解决方案获得的质量得分(“模块度”)的分布与从原始矩阵结果获得的得分进行比较。通过这种方式,可以评估结果是否优于随机预期的结果。perturbR 自动化了这 2 种方法,为心理研究人员提供了一个易于使用的资源。我们使用先前研究中生成的基准模拟数据演示了该软件包的实用性,然后将这些方法应用于从社交网络和结构神经影像学获得的公开可用的经验数据。(PsycINFO 数据库记录(c)2019 APA,保留所有权利)。