Overall J E, Gibson J M, Novy D M
Department of Psychiatry and Behavioral Sciences, University of Texas Medical School, Houston 77225.
J Clin Psychol. 1993 Jul;49(4):459-70. doi: 10.1002/1097-4679(199307)49:4<459::aid-jclp2270490402>3.0.co;2-p.
Comparative evaluation of population recovery capabilities of 35 cluster analysis methods defined by different combinations of 5 profile similarity measures and 7 agglomeration rules was undertaken using artificial data that represented duplicate mixture samples from 4 latent populations. The latent population mean profiles differed primarily in elevation or in pattern parameters. Latent population sampling variances were controlled to provide two different levels of realistic overlap. The within-population distributions were multivariate normal with diagonal covariance structure. Across all conditions examined, complete linkage and Ward's minimum variance methods, used with Euclidian or city block interprofile distance measures, performed best. Single linkage, median, and centroid methods were substantially inferior for clustering individuals in accordance with true population memberships.
利用代表来自4个潜在群体的重复混合样本的人工数据,对由5种轮廓相似性度量和7种凝聚规则的不同组合定义的35种聚类分析方法的总体恢复能力进行了比较评估。潜在群体的平均轮廓主要在海拔或模式参数上有所不同。控制潜在群体抽样方差以提供两种不同水平的实际重叠。群体内分布为具有对角协方差结构的多元正态分布。在所研究的所有条件下,与欧几里得或街区轮廓间距离度量一起使用的完全连锁法和沃德最小方差法表现最佳。单连锁法、中位数法和质心法在根据真实群体成员身份对个体进行聚类方面明显较差。