Hu Jianhua, Wright Fred A
Department of Biostatistics and Applied Mathematics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030-4009, USA.
Biometrics. 2007 Mar;63(1):41-9. doi: 10.1111/j.1541-0420.2006.00675.x.
The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.
在微阵列实验中,当阵列数量非常少时,识别在两个样本中差异表达的基因仍然是一个难题。我们讨论了使用普通t统计量的影响,并研究了其他常用的变体。对于每个基因有多个探针的寡核苷酸阵列,我们引入了一个简单的模型,该模型将表达的均值和方差联系起来,可能还包含基因特异性随机效应。该模型的参数估计具有自然的收缩特性,可防止方差估计过小,并且该模型用于获得差异表达统计量。普通t检验的正错误发现率(pFDR)的极限值为我们利用数据结构改进方差估计提供了动机。在错误发现率方面,我们的方法与其他提出的方法相比表现良好。