Division of Biostatistics, School of Public Health and School of Statistics, University of Minnesota, Minneapolis, MN, USA.
Bioinformatics. 2010 Feb 15;26(4):501-8. doi: 10.1093/bioinformatics/btp707. Epub 2009 Dec 23.
Model-based clustering has been widely used, e.g. in microarray data analysis. Since for high-dimensional data variable selection is necessary, several penalized model-based clustering methods have been proposed tørealize simultaneous variable selection and clustering. However, the existing methods all assume that the variables are independent with the use of diagonal covariance matrices.
To model non-independence of variables (e.g. correlated gene expressions) while alleviating the problem with the large number of unknown parameters associated with a general non-diagonal covariance matrix, we generalize the mixture of factor analyzers to that with penalization, which, among others, can effectively realize variable selection. We use simulated data and real microarray data to illustrate the utility and advantages of the proposed method over several existing ones.
基于模型的聚类已被广泛应用,例如在微阵列数据分析中。由于对于高维数据,变量选择是必要的,因此已经提出了几种惩罚性基于模型的聚类方法来实现同时进行变量选择和聚类。然而,现有的方法都假设变量是独立的,使用对角协方差矩阵。
为了在缓解与一般非对角协方差矩阵相关的大量未知参数问题的同时对变量的非独立性(例如相关基因表达)进行建模,我们将因子分析的混合推广到具有惩罚的混合,它可以有效地实现变量选择。我们使用模拟数据和真实的微阵列数据来说明所提出的方法相对于其他几种现有方法的实用性和优势。