Department of Analytics, Information Systems, & Supply Chain, Florida State University.
Department of Psychological Sciences, University of Missouri.
Psychol Methods. 2017 Sep;22(3):563-580. doi: 10.1037/met0000095. Epub 2016 Sep 8.
The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record
基于二分类变量对对象进行分类的问题是心理学研究中一个成熟的问题,其应用包括临床诊断、教育测试、认知分类和选择分析。潜在类别分析和 K 均值聚类是基于心理学文献中二分类测量对对象进行分类的常用方法。最近,K-中位数聚类方法被吹捧为一种用于心理学数据的潜在有用工具,并且在变量测量为二分类时,它可能比其近亲 K-均值更可取。我们对潜在类别、K-均值和 K-中位数方法进行了基于模拟的比较,用于对二分类数据进行分区。尽管所有 3 种方法都能够恢复聚类结构,但 K-中位数聚类的性能最佳,其次是潜在类别分析。我们还报告了这 3 种方法在传递推理数据中的应用结果,结果表明,当应用于实际数据时,这 3 种方法可能会表现出明显的差异。