Suppr超能文献

一种基于混合模型的微阵列表达数据聚类方法。

A mixture model-based approach to the clustering of microarray expression data.

作者信息

McLachlan G J, Bean R W, Peel D

机构信息

Department of Mathematics, University of Queensland, Brisbane, Queensland 4072, Australia.

出版信息

Bioinformatics. 2002 Mar;18(3):413-22. doi: 10.1093/bioinformatics/18.3.413.

Abstract

MOTIVATION

This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes.

RESULTS

The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

AVAILABILITY

EMMIX-GENE is available at http://www.maths.uq.edu.au/~gjm/emmix-gene/

摘要

动机

本文介绍了软件EMMIX - GENE,它是为基于模型的微阵列表达数据聚类方法专门开发的,特别是用于对大量基因的组织样本进行聚类。在参数聚类分析中,后者是一个非标准问题,因为特征空间的维度(基因数量)通常远大于组织数量。一种可行的方法是,首先通过拟合t分布混合物来选择与组织样本聚类相关的基因子集,以便按照混合模型中一元与二元成分检验的似然比统计量大小递增的顺序对基因进行排序。对似然比统计量设置阈值并结合聚类大小的阈值,可以选择一组相关基因。然而,即使是这个缩减后的基因集,通常对于直接拟合到组织上的正态混合模型来说仍然太大,因此利用因子分析器的混合物来有效降低基因特征空间的维度。

结果

在两个关于结肠和白血病组织的著名数据集上证明了EMMIX - GENE方法对组织样本聚类的有效性。对于这两个数据集,都能够选择相关的基因子集,这些子集揭示了组织有趣的聚类情况,这些聚类要么与组织的外部分类一致,要么与这些数据集的背景和生物学知识一致。

可用性

EMMIX - GENE可在http://www.maths.uq.edu.au/~gjm/emmix - gene/获取

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验