基于潜在高斯混合模型的微阵列表达数据的模型聚类。

Model-based clustering of microarray expression data via latent Gaussian mixture models.

机构信息

Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada.

出版信息

Bioinformatics. 2010 Nov 1;26(21):2705-12. doi: 10.1093/bioinformatics/btq498. Epub 2010 Aug 29.

DOI:10.1093/bioinformatics/btq498

PMID:20802251

Abstract

MOTIVATION

In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets.

RESULTS

The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.

AVAILABILITY

The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info

摘要

动机

近年来，人们一直在进行基因表达微阵列数据的聚类工作。有些方法是从算法角度开发的，而有些方法则是通过应用混合模型开发的。在本文中，扩展了一个利用因子分析协方差结构的八类混合模型家族，将其应用于基因表达微阵列数据。这种建模方法是在以前的工作基础上进行的，引入了一种改进的因子分析协方差结构，从而得到了一个包含简约模型在内的十二类混合模型家族。该模型家族允许对基因表达水平之间的相关性进行建模，即使样本数量较少也是如此。使用期望最大化算法的变体进行参数估计，并使用贝叶斯信息准则进行模型选择。将这个扩展的高斯混合模型家族称为扩展简约高斯混合模型（EPGMM）家族，然后将其应用于两个著名的基因表达数据集。

结果

使用调整后的 Rand 指数来量化 EPGMM 模型家族的性能。当将该模型家族应用于真实的基因表达微阵列数据时，其性能相对于现有的流行聚类技术非常出色。

可用性

分析所使用的简化、预处理后的数据可在 www.paulmcnicholas.info 上获取。

相似文献

Model-based clustering of microarray expression data via latent Gaussian mixture models.基于潜在高斯混合模型的微阵列表达数据的模型聚类。

Bioinformatics. 2010 Nov 1;26(21):2705-12. doi: 10.1093/bioinformatics/btq498. Epub 2010 Aug 29.

A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression.在复制微阵列基因表达的稳健混合聚类中纳入探针水平测量误差。

Stat Appl Genet Mol Biol. 2010;9:Article42. doi: 10.2202/1544-6115.1600. Epub 2010 Dec 9.

Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data.一种用于基因表达数据聚类分析的变分贝叶斯混合建模框架。

Bioinformatics. 2005 Jul 1;21(13):3025-33. doi: 10.1093/bioinformatics/bti466. Epub 2005 Apr 28.

Variable selection for model-based high-dimensional clustering and its application to microarray data.基于模型的高维聚类的变量选择及其在微阵列数据中的应用。

Biometrics. 2008 Jun;64(2):440-8. doi: 10.1111/j.1541-0420.2007.00922.x. Epub 2007 Oct 26.

Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类

Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.

Bayesian finite Markov mixture model for temporal multi-tissue polygenic patterns.用于时间多组织多基因模式的贝叶斯有限马尔可夫混合模型。

Biom J. 2009 Feb;51(1):56-69. doi: 10.1002/bimj.200710489.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类：采用TSD预聚类的FCV法

Appl Bioinformatics. 2003;2(1):35-45.

引用本文的文献

A Statistical Learning-Based Clustering Model With Features Selection to Identify Dyslexia in School-Aged Children.一种基于统计学习的带有特征选择的聚类模型，用于识别学龄儿童的诵读困难症。

Dyslexia. 2025 Nov;31(4):e70013. doi: 10.1002/dys.70013.

A novel multislice framework for precision 3D spatial domain reconstruction and disease pathology analysis.一种用于精确3D空间域重建和疾病病理分析的新型多层框架。

Genome Res. 2025 Aug 1;35(8):1794-1808. doi: 10.1101/gr.280281.124.

GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies.GeM-LR：在疫苗研究中发现小数据集的预测生物标志物。

PLoS Comput Biol. 2024 Nov 14;20(11):e1012581. doi: 10.1371/journal.pcbi.1012581. eCollection 2024 Nov.

Barriers of the CNS transfer rate dynamics in patients with vascular cognitive impairment and dementia.血管性认知障碍和痴呆患者中枢神经系统转运速率动力学的障碍

Front Aging Neurosci. 2024 Sep 25;16:1462302. doi: 10.3389/fnagi.2024.1462302. eCollection 2024.

Average Entropy of Gaussian Mixtures.高斯混合模型的平均熵

Entropy (Basel). 2024 Aug 1;26(8):659. doi: 10.3390/e26080659.

Research on Using K-Means Clustering to Explore High-Risk Products with Ethylene Oxide Residues and Their Manufacturers in Taiwan.运用K均值聚类法探索台湾地区环氧乙烷残留高风险产品及其制造商的研究

Foods. 2024 Aug 11;13(16):2510. doi: 10.3390/foods13162510.

Pathway-based analyses of gene expression profiles at low doses of ionizing radiation.低剂量电离辐射下基因表达谱的基于通路的分析。

Front Bioinform. 2024 May 14;4:1280971. doi: 10.3389/fbinf.2024.1280971. eCollection 2024.

Clustering microbiome data using mixtures of logistic normal multinomial models.使用逻辑正态多项混合模型对微生物组数据进行聚类。

Sci Rep. 2023 Sep 7;13(1):14758. doi: 10.1038/s41598-023-41318-8.

A comprehensive survey on computational learning methods for analysis of gene expression data.关于用于基因表达数据分析的计算学习方法的全面综述。

Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022.

GMMchi: gene expression clustering using Gaussian mixture modeling.GMMchi：基于高斯混合模型的基因表达聚类。

BMC Bioinformatics. 2022 Nov 2;23(1):457. doi: 10.1186/s12859-022-05006-0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于潜在高斯混合模型的微阵列表达数据的模型聚类。

Model-based clustering of microarray expression data via latent Gaussian mixture models.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献