Heymans Institute for Psychology, Psychometrics & Statistics, University of Groningen, Grote Kruisstraat 2/1, 9712TS, Groningen, The Netherlands,
Behav Res Methods. 2013 Dec;45(4):1011-23. doi: 10.3758/s13428-013-0329-y.
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
为了实现对多元数据的深入聚类,我们提出了子空间 K-均值方法。其核心思想是在降维空间中对质心和聚类残差进行建模,这使得它能够处理广泛的聚类类型,并对聚类进行丰富的解释。我们回顾了现有的相关聚类方法,包括确定性、随机性和无监督学习方法。为了评估子空间 K-均值方法,我们进行了比较模拟研究,在研究中我们操纵了子空间的重叠、聚类间方差和误差方差。研究表明,子空间 K-均值算法对局部最小值很敏感,但可以通过使用各种聚类程序的分区作为算法的起点来合理处理该问题。子空间 K-均值在恢复所有考虑条件下的真实聚类方面表现非常出色,并且似乎优于其竞争方法:K-均值、简化 K-均值、因子 K-均值、因子分析混合模型(MFA)和 MCLUST。最好的竞争方法 MFA 在简单条件下的性能与子空间 K-均值相似,但在更困难的条件下性能会下降。使用来自父母行为研究的数据,我们表明子空间 K-均值分析在聚类特征方面提供了丰富的见解,包括聚类的相对位置(通过质心)和聚类的形状(通过聚类内残差)。