Conlon Erin M
Department of Mathematics and Statistics, University of Massachusetts, 710 North Pleasant Street, Amherst, MA 01003-9305, USA.
Funct Integr Genomics. 2008 Feb;8(1):43-53. doi: 10.1007/s10142-007-0058-3. Epub 2007 Sep 19.
The increased availability of microarray data has been calling for statistical methods to integrate findings across studies. A common goal of microarray analysis is to determine differentially expressed genes between two conditions, such as treatment vs control. A recent Bayesian metaanalysis model used a prior distribution for the mean log-expression ratios that was a mixture of two normal distributions. This model centered the prior distribution of differential expression at zero, and separated genes into two groups only: expressed and nonexpressed. Here, we introduce a Bayesian three-component truncated normal mixture prior model that more flexibly assigns prior distributions to the differentially expressed genes and produces three groups of genes: up and downregulated, and nonexpressed. We found in simulations of two and five studies that the three-component model outperformed the two-component model using three comparison measures. When analyzing biological data of Bacillus subtilis, we found that the three-component model discovered more genes and omitted fewer genes for the same levels of posterior probability of differential expression than the two-component model, and discovered more genes for fixed thresholds of Bayesian false discovery. We assumed that the data sets were produced from the same microarray platform and were prescaled.
微阵列数据可用性的增加一直需要统计方法来整合跨研究的结果。微阵列分析的一个共同目标是确定两种条件之间差异表达的基因,例如治疗组与对照组。最近的一种贝叶斯荟萃分析模型对平均对数表达比值使用了先验分布,该先验分布是两个正态分布的混合。该模型将差异表达的先验分布中心设定为零,并且仅将基因分为两组:表达的和未表达的。在此,我们引入一种贝叶斯三成分截断正态混合先验模型,该模型更灵活地为差异表达基因分配先验分布,并产生三组基因:上调和下调的,以及未表达的。我们在两项和五项研究的模拟中发现,使用三种比较指标,三成分模型优于两成分模型。在分析枯草芽孢杆菌的生物学数据时,我们发现,对于相同水平的差异表达后验概率,与两成分模型相比,三成分模型发现了更多的基因且遗漏的基因更少,并且对于固定的贝叶斯错误发现阈值发现了更多的基因。我们假设数据集是由相同的微阵列平台产生的并且已经过预缩放。