Wille Anja, Gruissem Wilhelm, Bühlmann Peter, Hennig Lars
Seminar for Statistics, ETH Zurich, CH-8092, Zurich, Switzerland.
Plant J. 2007 Nov;52(3):561-9. doi: 10.1111/j.1365-313X.2007.03227.x. Epub 2007 Aug 3.
Accurately identifying differentially expressed genes from microarray data is not a trivial task, partly because of poor variance estimates of gene expression signals. Here, after analyzing 380 replicated microarray experiments, we found that probesets have typical, distinct variances that can be estimated based on a large number of microarray experiments. These probeset-specific variances depend at least in part on the function of the probed gene: genes for ribosomal or structural proteins often have a small variance, while genes implicated in stress responses often have large variances. We used these variance estimates to develop a statistical test for differentially expressed genes called EVE (external variance estimation). The EVE algorithm performs better than the t-test and LIMMA on some real-world data, where external information from appropriate databases is available. Thus, EVE helps to maximize the information gained from a typical microarray experiment. Nonetheless, only a large number of replicates will guarantee to identify nearly all truly differentially expressed genes. However, our simulation studies suggest that even limited numbers of replicates will usually result in good coverage of strongly differentially expressed genes.
从微阵列数据中准确识别差异表达基因并非易事,部分原因在于基因表达信号的方差估计不佳。在此,在分析了380个重复的微阵列实验后,我们发现探针集具有典型的、独特的方差,这些方差可以基于大量微阵列实验进行估计。这些特定于探针集的方差至少部分取决于被探测基因的功能:核糖体或结构蛋白的基因通常方差较小,而与应激反应相关的基因通常方差较大。我们利用这些方差估计开发了一种用于差异表达基因的统计检验方法,称为EVE(外部方差估计)。在一些可获取来自适当数据库外部信息的实际数据上,EVE算法比t检验和LIMMA表现更好。因此,EVE有助于最大化从典型微阵列实验中获得的信息。尽管如此,只有大量重复实验才能保证识别出几乎所有真正差异表达的基因。然而,我们的模拟研究表明,即使重复实验数量有限,通常也能很好地覆盖强差异表达基因。