Suppr超能文献

基于潜在高斯混合模型的微阵列表达数据的模型聚类。

Model-based clustering of microarray expression data via latent Gaussian mixture models.

机构信息

Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada.

出版信息

Bioinformatics. 2010 Nov 1;26(21):2705-12. doi: 10.1093/bioinformatics/btq498. Epub 2010 Aug 29.

Abstract

MOTIVATION

In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets.

RESULTS

The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.

AVAILABILITY

The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info

摘要

动机

近年来,人们一直在进行基因表达微阵列数据的聚类工作。有些方法是从算法角度开发的,而有些方法则是通过应用混合模型开发的。在本文中,扩展了一个利用因子分析协方差结构的八类混合模型家族,将其应用于基因表达微阵列数据。这种建模方法是在以前的工作基础上进行的,引入了一种改进的因子分析协方差结构,从而得到了一个包含简约模型在内的十二类混合模型家族。该模型家族允许对基因表达水平之间的相关性进行建模,即使样本数量较少也是如此。使用期望最大化算法的变体进行参数估计,并使用贝叶斯信息准则进行模型选择。将这个扩展的高斯混合模型家族称为扩展简约高斯混合模型(EPGMM)家族,然后将其应用于两个著名的基因表达数据集。

结果

使用调整后的 Rand 指数来量化 EPGMM 模型家族的性能。当将该模型家族应用于真实的基因表达微阵列数据时,其性能相对于现有的流行聚类技术非常出色。

可用性

分析所使用的简化、预处理后的数据可在 www.paulmcnicholas.info 上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验