Bowman F DuBois, Patel Rajan, Lu Chengxing
Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, USA.
Hum Brain Mapp. 2004 Oct;23(2):109-19. doi: 10.1002/hbm.20050.
Data-driven statistical methods are useful for examining the spatial organization of human brain function. Cluster analysis is one approach that aims to identify spatial classifications of temporal brain activity profiles. Numerous clustering algorithms are available, and no one method is optimal for all areas of application because an algorithm's performance depends on specific characteristics of the data. K-means and fuzzy clustering are popular for neuroimaging analyses, and select hierarchical procedures also appear in the literature. It is unclear which clustering methods perform best for neuroimaging data. We conduct a simulation study, based on PET neuroimaging data, to evaluate the performances of several clustering algorithms, including a new procedure that builds on the kth nearest neighbor method. We also examine three stopping rules that assist in determining the optimal number of clusters. Five hierarchical clustering algorithms perform best in our study, some of which are new to neuroimaging analyses, with Ward's and the beta-flexible methods exhibiting the strongest performances. Furthermore, Ward's and the beta-flexible methods yield the best performances for noisy data, and the popular K-means and fuzzy clustering procedures also perform reasonably well. The stopping rules also exhibit good performances for the top five clustering algorithms, and the pseudo-T2 and pseudo-F stopping rules are superior for noisy data. Based on our simulations for both noisy and unscaled PET neuroimaging data, we recommend the combined use of the pseudo-F or pseudo-T2 stopping rule along with either Ward's or the beta-flexible clustering algorithm.
数据驱动的统计方法对于研究人类大脑功能的空间组织很有用。聚类分析是一种旨在识别大脑活动时间剖面空间分类的方法。有许多聚类算法可供使用,而且没有一种方法对所有应用领域都是最优的,因为算法的性能取决于数据的特定特征。K均值聚类和模糊聚类在神经影像学分析中很常用,文献中也出现了一些分层聚类方法。目前尚不清楚哪种聚类方法对神经影像学数据的效果最佳。我们基于正电子发射断层扫描(PET)神经影像学数据进行了一项模拟研究,以评估几种聚类算法的性能,包括一种基于第k近邻方法的新方法。我们还研究了三种有助于确定最佳聚类数的停止规则。在我们的研究中,五种分层聚类算法表现最佳,其中一些在神经影像学分析中是新出现的,Ward方法和β灵活方法表现最为突出。此外,Ward方法和β灵活方法在处理有噪声数据时表现最佳,常用的K均值聚类和模糊聚类方法也表现得相当不错。对于排名前五的聚类算法,停止规则也表现良好,伪T2和伪F停止规则在处理有噪声数据时更具优势。基于我们对有噪声和未缩放的PET神经影像学数据的模拟,我们建议将伪F或伪T2停止规则与Ward方法或β灵活聚类算法结合使用。