Suppr超能文献

基于马尔可夫随机场先验的判别分析的变量选择在微阵列数据分析中的应用。

Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data.

机构信息

Department of Statistics, Rice University, Houston, TX 77005, USA.

出版信息

Bioinformatics. 2011 Feb 15;27(4):495-501. doi: 10.1093/bioinformatics/btq690. Epub 2010 Dec 14.

Abstract

MOTIVATION

Discriminant analysis is an effective tool for the classification of experimental units into groups. Here, we consider the typical problem of classifying subjects according to phenotypes via gene expression data and propose a method that incorporates variable selection into the inferential procedure, for the identification of the important biomarkers. To achieve this goal, we build upon a conjugate normal discriminant model, both linear and quadratic, and include a stochastic search variable selection procedure via an MCMC algorithm. Furthermore, we incorporate into the model prior information on the relationships among the genes as described by a gene-gene network. We use a Markov random field (MRF) prior to map the network connections among genes. Our prior model assumes that neighboring genes in the network are more likely to have a joint effect on the relevant biological processes.

RESULTS

We use simulated data to assess performances of our method. In particular, we compare the MRF prior to a situation where independent Bernoulli priors are chosen for the individual predictors. We also illustrate the method on benchmark datasets for gene expression. Our simulation studies show that employing the MRF prior improves on selection accuracy. In real data applications, in addition to identifying markers and improving prediction accuracy, we show how the integration of existing biological knowledge into the prior model results in an increased ability to identify genes with strong discriminatory power and also aids the interpretation of the results.

摘要

动机

判别分析是将实验单位分类为组的有效工具。在这里,我们考虑通过基因表达数据根据表型对主体进行分类的典型问题,并提出了一种将变量选择纳入推理过程的方法,以识别重要的生物标志物。为了实现这一目标,我们基于共轭正态判别模型(线性和二次)构建,并通过 MCMC 算法包含随机搜索变量选择过程。此外,我们将基因之间关系的先验信息(如基因-基因网络所描述)纳入模型中。我们使用马尔可夫随机场 (MRF) 先验来映射基因之间的网络连接。我们的先验模型假设网络中相邻的基因更有可能对相关的生物过程产生共同影响。

结果

我们使用模拟数据来评估我们方法的性能。特别是,我们将 MRF 先验与为单个预测器选择独立伯努利先验的情况进行了比较。我们还在基因表达的基准数据集上说明了该方法。我们的模拟研究表明,使用 MRF 先验可以提高选择准确性。在实际数据应用中,除了识别标记和提高预测准确性之外,我们还展示了将现有生物学知识集成到先验模型中如何提高识别具有强判别能力的基因的能力,并且有助于解释结果。

相似文献

引用本文的文献

本文引用的文献

6
KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor.KEGGgraph:R语言和生物导体中KEGG通路的图形化方法
Bioinformatics. 2009 Jun 1;25(11):1470-1. doi: 10.1093/bioinformatics/btp167. Epub 2009 Mar 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验