Qiu Xing, Klebanov Lev, Yakovlev Andrei
Department of Biostatistics and Computational Biology, University of Rochester.
Stat Appl Genet Mol Biol. 2005;4:Article34. doi: 10.2202/1544-6115.1157. Epub 2005 Nov 22.
Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test statistics across genes. The empirical Bayes methodology in the nonparametric and parametric formulations, as well as closely related methods employing a two-component mixture model, represent typical examples. It is frequently assumed that dependence between gene expressions (or associated test statistics) is sufficiently weak to justify the application of such methods for selecting differentially expressed genes. By applying resampling techniques to simulated and real biological data sets, we have studied a potential impact of the correlation between gene expression levels on the statistical inference based on the empirical Bayes methodology. We report evidence from these analyses that this impact may be quite strong, leading to a high variance of the number of differentially expressed genes. This study also pinpoints specific components of the empirical Bayes method where the reported effect manifests itself.
微阵列数据中基因表达水平之间的随机依赖性对于那些依靠合并跨基因检验统计量的统计推断方法至关重要。非参数和参数形式的经验贝叶斯方法,以及采用双组分混合模型的密切相关方法,都是典型例子。人们常常假定基因表达(或相关检验统计量)之间的依赖性足够弱,从而能够证明应用此类方法来选择差异表达基因是合理的。通过将重采样技术应用于模拟和真实生物数据集,我们研究了基因表达水平之间的相关性对基于经验贝叶斯方法的统计推断的潜在影响。我们从这些分析中报告证据表明,这种影响可能相当强烈,导致差异表达基因数量的高方差。这项研究还指出了经验贝叶斯方法中报告的效应表现出来的特定组成部分。