• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种具有随机效应成分的混合模型,用于对相关基因表达谱进行聚类。

A mixture model with random-effects components for clustering correlated gene-expression profiles.

作者信息

Ng S K, McLachlan G J, Wang K, Ben-Tovim Jones L, Ng S-W

机构信息

Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia.

出版信息

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

DOI:10.1093/bioinformatics/btl165
PMID:16675467
Abstract

MOTIVATION

The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes.

RESULTS

We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.

AVAILABILITY

A Fortran program blue called EMMIX-WIRE (EM-based MIXture analysis WIth Random Effects) is available on request from the corresponding author.

摘要

动机

在一些感兴趣的实验条件下对基因谱进行聚类,对于阐明未知基因功能、验证基因发现以及解释生物过程有显著贡献。然而,这个聚类问题并不简单,因为基因谱并非全部独立分布,而且表达水平可能是通过涉及重复阵列的实验设计获得的。忽略基因谱之间的依赖性以及重复数据的结构,可能导致在分析中忽略实验中重要的变异来源,从而可能得出误导性的推断。我们提出了一种随机效应模型,该模型为在各种实验情况下测量的具有相关表达水平的基因聚类提供了一种统一的方法。我们的模型是正态混合模型的扩展,用于考虑基因谱之间的相关性,并使协变量信息能够纳入聚类过程。因此,该模型适用于有或无重复的纵向研究,例如,将时间作为协变量的时间进程实验,以及使用分类协变量来表示不同实验类别的横断面实验。

结果

我们表明,我们的随机效应模型可以通过期望最大化(EM)算法以最大似然法进行拟合,其中E(期望)和M(最大化)步骤可以以封闭形式实现。因此,我们的模型可以确定性地拟合,无需耗时的蒙特卡罗近似。我们基于模型的方法对相关基因谱进行聚类的有效性在三个真实数据集上得到了证明,这些数据集代表了典型的微阵列实验设计,涵盖时间进程、重复测量和横断面数据。在这些例子中,获得了相关的基因簇,这些基因簇得到了现有基因功能注释的支持。还考虑了一个合成数据集。

可用性

可应通讯作者的要求提供一个名为EMMIX-WIRE(基于EM的具有随机效应的混合分析)的Fortran程序。

相似文献

1
A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型,用于对相关基因表达谱进行聚类。
Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.
2
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
3
A fully Bayesian model to cluster gene-expression profiles.一种用于对基因表达谱进行聚类的全贝叶斯模型。
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii130-6. doi: 10.1093/bioinformatics/bti1122.
4
Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。
Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.
5
Incorporating gene functions as priors in model-based clustering of microarray gene expression data.在基于模型的微阵列基因表达数据聚类中纳入基因功能作为先验信息。
Bioinformatics. 2006 Apr 1;22(7):795-801. doi: 10.1093/bioinformatics/btl011. Epub 2006 Jan 24.
6
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.用于基因分组的分裂相关聚类算法(DCCA):检测表达谱中的变化模式。
Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.
7
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
8
A multi-stage approach to clustering and imputation of gene expression profiles.一种用于基因表达谱聚类和插补的多阶段方法。
Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.
9
Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类
Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.
10
A hidden Markov model-based approach for identifying timing differences in gene expression under different experimental factors.一种基于隐马尔可夫模型的方法,用于识别不同实验因素下基因表达的时间差异。
Bioinformatics. 2007 Apr 1;23(7):842-9. doi: 10.1093/bioinformatics/btl667. Epub 2007 Jan 19.

引用本文的文献

1
Spectral Clustering, Bayesian Spanning Forest, and Forest Process.谱聚类、贝叶斯生成森林和森林过程。
J Am Stat Assoc. 2024;119(547):2140-2153. doi: 10.1080/01621459.2023.2250098. Epub 2023 Sep 29.
2
Guidelines for repeated measures statistical analysis approaches with basic science research considerations.具有基础科学研究考量的重复测量统计分析方法指南。
J Clin Invest. 2023 Jun 1;133(11):e171058. doi: 10.1172/JCI171058.
3
Class enumeration false positive in skew-t family of continuous growth mixture models.偏态 t 家族连续增长混合模型中的类别枚举假阳性。
PLoS One. 2020 Apr 17;15(4):e0231525. doi: 10.1371/journal.pone.0231525. eCollection 2020.
4
Coordination Analysis of Gene Expression Points to the Relative Impact of Different Regulators During Endoplasmic Reticulum Stress.协调分析基因表达点表明内质网应激过程中不同调节剂的相对影响。
DNA Cell Biol. 2019 Sep;38(9):969-981. doi: 10.1089/dna.2019.4910. Epub 2019 Aug 6.
5
Repeated measures regression mixture models.重复测量回归混合模型。
Behav Res Methods. 2020 Apr;52(2):591-606. doi: 10.3758/s13428-019-01257-7.
6
Informatively clustering longitudinal microarrays using binary or survival outcome data.使用二元或生存结局数据对纵向微阵列进行信息性聚类。
Commun Stat Case Stud Data Anal Appl. 2018;4(1):18-27. doi: 10.1080/23737484.2018.1455542. Epub 2018 Apr 9.
7
A Tailored Multivariate Mixture Model for Detecting Proteins of Concordant Change Among Virulent Strains of .一种用于检测……强毒株中一致性变化蛋白质的定制多变量混合模型
J Am Stat Assoc. 2018;113(522):546-559. doi: 10.1080/01621459.2017.1356314. Epub 2018 Jun 12.
8
Clustering of temporal gene expression data with mixtures of mixed effects models with a penalized likelihood.基于惩罚似然的混合效应模型混合的时间基因表达数据聚类。
Bioinformatics. 2019 Mar 1;35(5):778-786. doi: 10.1093/bioinformatics/bty696.
9
Whole-Volume Clustering of Time Series Data from Zebrafish Brain Calcium Images via Mixture Modeling.通过混合建模对斑马鱼脑钙图像的时间序列数据进行全体积聚类。
Stat Anal Data Min. 2018 Feb;11(1):5-16. doi: 10.1002/sam.11366. Epub 2017 Dec 6.
10
A MULTIVARIATE FINITE MIXTURE LATENT TRAJECTORY MODEL WITH APPLICATION TO DEMENTIA STUDIES.一种应用于痴呆症研究的多元有限混合潜在轨迹模型。
J Appl Stat. 2016;43(14):2503-2523. doi: 10.1080/02664763.2016.1141181. Epub 2016 Feb 22.