Suppr超能文献

常见 t 因子分析器的混合物用于聚类高维微阵列数据。

Mixtures of common t-factor analyzers for clustering high-dimensional microarray data.

机构信息

Department of Statistics, Chonnam National University, Gwangju, South Korea.

出版信息

Bioinformatics. 2011 May 1;27(9):1269-76. doi: 10.1093/bioinformatics/btr112. Epub 2011 Mar 3.

Abstract

MOTIVATION

Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions.

RESULTS

An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods.

AVAILABILITY

The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100.

摘要

动机

混合因子分析器可用于对高维微阵列数据进行基于模型的聚类,其中观测值 n 的数量相对于基因 p 的数量较小。此外,当聚类的数量不小时,例如存在几种不同类型的癌症时,可能需要进一步减少组件协方差矩阵规范中的参数数量。通过使用具有共同组件因子载荷的因子分析器的混合物(MCFA)可以进一步减少参数数量,这是一种更简约的模型。然而,这种方法对微阵列实验中常见的非正态性和异常值很敏感。MCFA 方法的这种敏感性是由于它基于混合模型,其中假设组件误差和因子分布的多元正态分布族。

结果

考虑了具有共同组件因子载荷的 t 因子分析器的混合物的扩展,其中采用多元 t 族作为组件误差和因子分布。开发了用于常见 t 因子分析器混合物拟合的 EM 算法。该模型可以处理尾巴比正态分布长的数据,对异常值具有鲁棒性,并允许在低维图中显示数据。这里将其应用于聚类的合成数据和一些微阵列基因表达数据,并显示出它优于几种现有方法的性能。

可用性

该算法已在 Matlab 中实现。Matlab 代码可在 http://blog.naver.com/aggie100 获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验