Department of Methodology and Statistics, Tilburg University,P.O. Box 90153, Tilburg, the Netherlands.
Psychol Methods. 2011 Mar;16(1):82-8; discussion 89-92. doi: 10.1037/a0020144.
Steinley and Brusco (2011) presented the results of a huge simulation study aimed at evaluating cluster recovery of mixture model clustering (MMC) both for the situation where the number of clusters is known and is unknown. They derived rather strong conclusions on the basis of this study, especially with regard to the good performance of K-means (KM) compared with MMC. I agree with the authors' conclusion that the performance of KM may be equal to MMC in certain situations, which are primarily the situations investigated by Steinley and Brusco. However, a weakness of the paper is the failure to investigate many important real-world situations where theory suggests that MMC should outperform KM. This article elaborates on the KM-MMC comparison in terms of cluster recovery and provides some additional simulation results that show that KM may be much worse than MMC. Moreover, I show that KM is equivalent to a restricted mixture model estimated by maximizing the classification likelihood and comment on Steinley and Brusco's recommendation regarding the use of mixture models for clustering.
斯坦利和布鲁斯科(2011)展示了一项大型模拟研究的结果,该研究旨在评估混合模型聚类(MMC)的聚类恢复情况,包括已知和未知聚类数量的情况。他们根据这项研究得出了相当强烈的结论,特别是在 K-均值(KM)与 MMC 的良好性能方面。我同意作者的结论,即 KM 的性能在某些情况下可能与 MMC 相当,这些情况主要是斯坦利和布鲁斯科调查的情况。然而,本文的一个弱点是未能研究许多重要的现实情况,而这些情况的理论表明 MMC 应该优于 KM。本文从聚类恢复的角度详细比较了 KM 和 MMC,并提供了一些额外的模拟结果,表明 KM 可能比 MMC 差得多。此外,我还证明了 KM 等同于通过最大化分类似然估计的受限混合模型,并对斯坦利和布鲁斯科关于使用混合模型进行聚类的建议进行了评论。