一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

A mixture model with random-effects components for clustering correlated gene-expression profiles.

作者信息

Ng S K, McLachlan G J, Wang K, Ben-Tovim Jones L, Ng S-W

机构信息

Department of Mathematics, University of Queensland, Brisbane, QLD 4072, Australia.

出版信息

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

DOI:10.1093/bioinformatics/btl165

PMID:16675467

Abstract

MOTIVATION

The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes.

RESULTS

We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.

AVAILABILITY

A Fortran program blue called EMMIX-WIRE (EM-based MIXture analysis WIth Random Effects) is available on request from the corresponding author.

摘要

动机

在一些感兴趣的实验条件下对基因谱进行聚类，对于阐明未知基因功能、验证基因发现以及解释生物过程有显著贡献。然而，这个聚类问题并不简单，因为基因谱并非全部独立分布，而且表达水平可能是通过涉及重复阵列的实验设计获得的。忽略基因谱之间的依赖性以及重复数据的结构，可能导致在分析中忽略实验中重要的变异来源，从而可能得出误导性的推断。我们提出了一种随机效应模型，该模型为在各种实验情况下测量的具有相关表达水平的基因聚类提供了一种统一的方法。我们的模型是正态混合模型的扩展，用于考虑基因谱之间的相关性，并使协变量信息能够纳入聚类过程。因此，该模型适用于有或无重复的纵向研究，例如，将时间作为协变量的时间进程实验，以及使用分类协变量来表示不同实验类别的横断面实验。

结果

我们表明，我们的随机效应模型可以通过期望最大化（EM）算法以最大似然法进行拟合，其中E（期望）和M（最大化）步骤可以以封闭形式实现。因此，我们的模型可以确定性地拟合，无需耗时的蒙特卡罗近似。我们基于模型的方法对相关基因谱进行聚类的有效性在三个真实数据集上得到了证明，这些数据集代表了典型的微阵列实验设计，涵盖时间进程、重复测量和横断面数据。在这些例子中，获得了相关的基因簇，这些基因簇得到了现有基因功能注释的支持。还考虑了一个合成数据集。

可用性

可应通讯作者的要求提供一个名为EMMIX-WIRE（基于EM的具有随机效应的混合分析）的Fortran程序。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

A mixture model with random-effects components for clustering correlated gene-expression profiles.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

A mixture model with random-effects components for clustering correlated gene-expression profiles.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献