Suppr超能文献

cDNA微阵列数据集的潜在过程分解

The latent process decomposition of cDNA microarray data sets.

作者信息

Rogers Simon, Girolami Mark, Campbell Colin, Breitling Rainer

机构信息

Bioinformatics Research Centre, Department of Computing Science, A416, Fourth Floor, Davidson Building, University of Glasgow, Glasgow G12 8QQ, Scotland, United Kingdom.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):143-56. doi: 10.1109/TCBB.2005.29.

Abstract

We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called Latent Process Decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in constrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.

摘要

我们提出了一种新的计算技术(软件实现、数据集及补充信息可在http://www.enm.bris.ac.uk/lpd/获取),该技术能够对cDNA微阵列数据进行概率分析,并且我们证明了其在识别具有生物医学重要性的特征方面的有效性。我们引入了一种分层贝叶斯模型,称为潜在过程分解(LPD),数据集中的每个样本都表示为有限潜在过程集上的组合混合,这些潜在过程预计对应于生物过程。使用有效的变分方法估计模型中的参数。这种概率模型最适合解释由cDNA微阵列技术生成的测量数据。对于确定此类数据集中的信息子结构,与标准的树状图使用相比,所提出的模型具有几个重要优势。首先,能够客观评估样本聚类的最佳数量。其次,能够使用一组共同的潜在变量来表示样本和基因表达水平(树状图分别对样本和基因表达值进行聚类,这相当于两种不同的降维空间表示)。第三,与标准聚类模型相比,表示不是将观测值分配到单个聚类中,因此,例如,基因表达水平是通过算法识别的潜在过程的组合来建模的。我们表明这种新方法与其他聚类分析方法相比具有优势。为了说明其潜力,我们将所提出的技术应用于几个癌症微阵列数据集。对于这些数据集,它成功地将数据分解为已知的亚型,并指出了可能的进一步分类细分,此外还以完全无监督的方式突出了某些已知具有医学意义的基因的重要性。为了说明其更广泛的适用性,我们还展示了它在酵母微阵列数据集上的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验