Newton Michael A, Noueiry Amine, Sarkar Deepayan, Ahlquist Paul
Department of Statistics, University of Wisconsin-Madison, 1210 West Dayton St, Madison, WI 53706-1685, USA.
Biostatistics. 2004 Apr;5(2):155-76. doi: 10.1093/biostatistics/5.2.155.
Mixture modeling provides an effective approach to the differential expression problem in microarray data analysis. Methods based on fully parametric mixture models are available, but lack of fit in some examples indicates that more flexible models may be beneficial. Existing, more flexible, mixture models work at the level of one-dimensional gene-specific summary statistics, and so when there are relatively few measurements per gene these methods may not provide sensitive detectors of differential expression. We propose a hierarchical mixture model to provide methodology that is both sensitive in detecting differential expression and sufficiently flexible to account for the complex variability of normalized microarray data. EM-based algorithms are used to fit both parametric and semiparametric versions of the model. We restrict attention to the two-sample comparison problem; an experiment involving Affymetrix microarrays and yeast translation provides the motivating case study. Gene-specific posterior probabilities of differential expression form the basis of statistical inference; they define short gene lists and false discovery rates. Compared to several competing methodologies, the proposed methodology exhibits good operating characteristics in a simulation study, on the analysis of spike-in data, and in a cross-validation calculation.
混合建模为微阵列数据分析中的差异表达问题提供了一种有效的方法。基于完全参数化混合模型的方法是可行的,但在某些示例中拟合不佳表明更灵活的模型可能会更有用。现有的、更灵活的混合模型在一维基因特异性汇总统计层面上起作用,因此当每个基因的测量值相对较少时,这些方法可能无法提供差异表达的灵敏检测。我们提出一种分层混合模型,以提供一种在检测差异表达时既灵敏又足够灵活以考虑标准化微阵列数据复杂变异性的方法。基于期望最大化(EM)的算法用于拟合模型的参数化和半参数化版本。我们将注意力限制在双样本比较问题上;一个涉及Affymetrix微阵列和酵母翻译的实验提供了激励性的案例研究。基因特异性差异表达的后验概率构成了统计推断的基础;它们定义了简短的基因列表和错误发现率。与几种竞争方法相比,所提出的方法在模拟研究、对掺入数据的分析以及交叉验证计算中表现出良好的操作特性。