Kim Sunmee, Choi Ji Yeh, Hwang Heungsun
a McGill University.
Multivariate Behav Res. 2017 Jan-Feb;52(1):31-46. doi: 10.1080/00273171.2016.1246996. Epub 2016 Nov 21.
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
多重对应分析(MCA)是一种用于研究虚拟编码分类变量之间相互关系的有用工具。MCA已与聚类方法相结合,以检验总体中是否存在表现出聚类水平异质性的不同子聚类。这些组合方法旨在仅对观测值进行分类(MCA的单向聚类)或对观测值和变量类别都进行分类(MCA的双向聚类)。后一种方法更受青睐,因为通过明确指出哪些观测值子组与哪些变量类别子集相关联,其结果更易于解释。尽管如此,双向方法基于硬分类,即假设观测值和/或变量类别仅属于一个聚类。为了放宽这一假设,我们提出了MCA的双向模糊聚类。具体而言,我们将MCA与模糊k均值同时结合,将一组观测值子组和一组变量类别子集分类到一个共同的聚类中,同时允许观测值和变量类别部分地属于多个聚类。重要的是,我们采用正则化模糊k均值,从而使我们能够自动确定聚类成员关系中的模糊程度。与现有的双向聚类方法相比,我们通过对模拟数据和实际数据的分析来评估所提出方法的性能。