Jung Yoon-Young, Oh Man-Suk, Shin Dong Wan, Kang Seung-Ho, Oh Hyun Sook
Department of Statistics, Ewha Womans University, Seoul 120-750, Korea.
Biom J. 2006 Jun;48(3):435-50. doi: 10.1002/bimj.200410230.
A Bayesian model-based clustering approach is proposed for identifying differentially expressed genes in meta-analysis. A Bayesian hierarchical model is used as a scientific tool for combining information from different studies, and a mixture prior is used to separate differentially expressed genes from non-differentially expressed genes. Posterior estimation of the parameters and missing observations are done by using a simple Markov chain Monte Carlo method. From the estimated mixture model, useful measure of significance of a test such as the Bayesian false discovery rate (FDR), the local FDR (Efron et al., 2001), and the integration-driven discovery rate (IDR; Choi et al., 2003) can be easily computed. The model-based approach is also compared with commonly used permutation methods, and it is shown that the model-based approach is superior to the permutation methods when there are excessive under-expressed genes compared to over-expressed genes or vice versa. The proposed method is applied to four publicly available prostate cancer gene expression data sets and simulated data sets.
提出了一种基于贝叶斯模型的聚类方法,用于在荟萃分析中识别差异表达基因。贝叶斯层次模型被用作整合不同研究信息的科学工具,混合先验用于将差异表达基因与非差异表达基因区分开来。参数的后验估计和缺失观测值通过使用简单的马尔可夫链蒙特卡罗方法进行。从估计的混合模型中,可以轻松计算出诸如贝叶斯错误发现率(FDR)、局部FDR(Efron等人,2001年)和整合驱动发现率(IDR;Choi等人,2003年)等检验显著性的有用度量。还将基于模型的方法与常用的置换方法进行了比较,结果表明,当与过表达基因相比存在过多低表达基因时,基于模型的方法优于置换方法,反之亦然。所提出的方法应用于四个公开可用的前列腺癌基因表达数据集和模拟数据集。