常见 t 因子分析器的混合物用于聚类高维微阵列数据。

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data.

机构信息

Department of Statistics, Chonnam National University, Gwangju, South Korea.

出版信息

Bioinformatics. 2011 May 1;27(9):1269-76. doi: 10.1093/bioinformatics/btr112. Epub 2011 Mar 3.

DOI:10.1093/bioinformatics/btr112

PMID:21372081

Abstract

MOTIVATION

Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions.

RESULTS

An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods.

AVAILABILITY

The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100.

摘要

动机

混合因子分析器可用于对高维微阵列数据进行基于模型的聚类，其中观测值 n 的数量相对于基因 p 的数量较小。此外，当聚类的数量不小时，例如存在几种不同类型的癌症时，可能需要进一步减少组件协方差矩阵规范中的参数数量。通过使用具有共同组件因子载荷的因子分析器的混合物（MCFA）可以进一步减少参数数量，这是一种更简约的模型。然而，这种方法对微阵列实验中常见的非正态性和异常值很敏感。MCFA 方法的这种敏感性是由于它基于混合模型，其中假设组件误差和因子分布的多元正态分布族。

结果

考虑了具有共同组件因子载荷的 t 因子分析器的混合物的扩展，其中采用多元 t 族作为组件误差和因子分布。开发了用于常见 t 因子分析器混合物拟合的 EM 算法。该模型可以处理尾巴比正态分布长的数据，对异常值具有鲁棒性，并允许在低维图中显示数据。这里将其应用于聚类的合成数据和一些微阵列基因表达数据，并显示出它优于几种现有方法的性能。

可用性

该算法已在 Matlab 中实现。Matlab 代码可在 http://blog.naver.com/aggie100 获得。

相似文献

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data.

Bioinformatics. 2011 May 1;27(9):1269-76. doi: 10.1093/bioinformatics/btr112. Epub 2011 Mar 3.

Segmentation and intensity estimation of microarray images using a gamma-t mixture model.

Bioinformatics. 2007 Feb 15;23(4):458-65. doi: 10.1093/bioinformatics/btl630. Epub 2006 Dec 12.

Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data.

IEEE Trans Pattern Anal Mach Intell. 2010 Jul;32(7):1298-309. doi: 10.1109/TPAMI.2009.149.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Biclustering of gene expression data by an extension of mixtures of factor analyzers.

Int J Biostat. 2008;4(1):Article 3. doi: 10.2202/1557-4679.1078.

Model-based clustering with gene ranking using penalized mixtures of heavy-tailed distributions.

J Bioinform Comput Biol. 2013 Jun;11(3):1341007. doi: 10.1142/S0219720013410072. Epub 2013 Mar 21.

Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression.

Stat Appl Genet Mol Biol. 2010;9:Article42. doi: 10.2202/1544-6115.1600. Epub 2010 Dec 9.

Penalized mixtures of factor analyzers with application to clustering high-dimensional microarray data.

Bioinformatics. 2010 Feb 15;26(4):501-8. doi: 10.1093/bioinformatics/btp707. Epub 2009 Dec 23.

Classification of microarray data with factor mixture models.

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

A mixture model-based approach to the clustering of microarray expression data.

Bioinformatics. 2002 Mar;18(3):413-22. doi: 10.1093/bioinformatics/18.3.413.

引用本文的文献

densityCut: an efficient and versatile topological approach for automatic clustering of biological data.

Bioinformatics. 2016 Sep 1;32(17):2567-76. doi: 10.1093/bioinformatics/btw227. Epub 2016 Apr 23.

Statistical Significance of Clustering using Soft Thresholding.

J Comput Graph Stat. 2015;24(4):975-993. doi: 10.1080/10618600.2014.948179. Epub 2015 Dec 10.

Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network.

Sensors (Basel). 2015 Aug 5;15(8):19047-68. doi: 10.3390/s150819047.

SMART: unique splitting-while-merging framework for gene clustering.

PLoS One. 2014 Apr 8;9(4):e94141. doi: 10.1371/journal.pone.0094141. eCollection 2014.

Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm.

PLoS One. 2013 Jun 17;8(6):e66256. doi: 10.1371/journal.pone.0066256. Print 2013.

Unsupervised Bayesian linear unmixing of gene expression microarrays.

BMC Bioinformatics. 2013 Mar 19;14:99. doi: 10.1186/1471-2105-14-99.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

常见 t 因子分析器的混合物用于聚类高维微阵列数据。

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献