Barrera Leah, Benner Chris, Tao Yong-Chuan, Winzeler Elizabeth, Zhou Yingyao
Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, California 92121, USA.
BMC Bioinformatics. 2004 Apr 20;5:42. doi: 10.1186/1471-2105-5-42.
To identify differentially expressed genes across experimental conditions in oligonucleotide microarray experiments, existing statistical methods commonly use a summary of probe-level expression data for each probe set and compare replicates of these values across conditions using a form of the t-test or rank sum test. Here we propose the use of a statistical method that takes advantage of the built-in redundancy architecture of high-density oligonucleotide arrays.
We employ parametric and nonparametric variants of two-way analysis of variance (ANOVA) on probe-level data to account for probe-level variation, and use the false-discovery rate (FDR) to account for simultaneous testing on thousands of genes (multiple testing problem). Using publicly available data sets, we systematically compared the performance of parametric two-way ANOVA and the nonparametric Mack-Skillings test to the t-test and Wilcoxon rank-sum test for detecting differentially expressed genes at varying levels of fold change, concentration, and sample size. Using receiver operating characteristic (ROC) curve comparisons, we observed that two-way methods with FDR control on sample sizes with 2-3 replicates exhibits the same high sensitivity and specificity as a t-test with FDR control on sample sizes with 6-9 replicates in detecting at least two-fold change.
Our results suggest that the two-way ANOVA methods using probe-level data are substantially more powerful tests for detecting differential gene expression than corresponding methods for probe-set level data.
在寡核苷酸微阵列实验中,为了识别不同实验条件下差异表达的基因,现有的统计方法通常使用每个探针集的探针水平表达数据汇总,并使用t检验或秩和检验的形式比较这些值在不同条件下的重复数据。在此,我们提出使用一种利用高密度寡核苷酸阵列内置冗余结构的统计方法。
我们在探针水平数据上采用双向方差分析(ANOVA)的参数和非参数变体来考虑探针水平的变异,并使用错误发现率(FDR)来处理对数以千计基因的同时检验(多重检验问题)。使用公开可用的数据集,我们系统地比较了参数双向ANOVA和非参数Mack-Skillings检验与t检验和Wilcoxon秩和检验在检测不同倍数变化、浓度和样本量水平下差异表达基因时的性能。通过受试者工作特征(ROC)曲线比较,我们观察到在样本量为2 - 重复3次时采用FDR控制的双向方法,在检测至少两倍变化时,与在样本量为6 - 重复9次时采用FDR控制的t检验具有相同的高灵敏度和特异性。
我们的结果表明,使用探针水平数据的双向ANOVA方法在检测差异基因表达方面比使用探针集水平数据的相应方法更具效力。