Pawitan Yudi, Calza Stefano, Ploner Alexander
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Bioinformatics. 2006 Dec 15;22(24):3025-31. doi: 10.1093/bioinformatics/btl527. Epub 2006 Oct 17.
Wide-scale correlations between genes are commonly observed in gene expression data, due to both biological and technical reasons. These correlations increase the variability of the standard estimate of the false discovery rate (FDR). We highlight the false discovery proportion (FDP, instead of the FDR) as the suitable quantity for assessing differential expression in microarray data, demonstrate the deleterious effects of correlation on FDP estimation and propose an improved estimation method that accounts for the correlations.
We analyse the variation pattern of the distribution of test statistics under permutation using the singular value decomposition. The results suggest a latent FDR model that accounts for the effects of correlation, and is statistically closer to the FDP. We develop a procedure for estimating the latent FDR (ELF) based on a Poisson regression model.
For simulated data based on the correlation structure of real datasets, we find that ELF performs substantially better than the standard FDR approach in estimating the FDP. We illustrate the use of ELF in the analysis of breast cancer and lymphoma data.
R code to perform ELF is available in http://www.meb.ki.se/~yudpaw.
由于生物学和技术原因,在基因表达数据中普遍观察到基因之间的大规模相关性。这些相关性增加了错误发现率(FDR)标准估计值的变异性。我们强调错误发现比例(FDP,而非FDR)是评估微阵列数据中差异表达的合适指标,证明相关性对FDP估计的有害影响,并提出一种考虑相关性的改进估计方法。
我们使用奇异值分解分析排列下检验统计量分布的变化模式。结果表明存在一个潜在的FDR模型,该模型考虑了相关性的影响,并且在统计上更接近FDP。我们基于泊松回归模型开发了一种估计潜在FDR(ELF)的程序。
对于基于真实数据集相关结构的模拟数据,我们发现ELF在估计FDP方面比标准FDR方法表现得好得多。我们展示了ELF在乳腺癌和淋巴瘤数据分析中的应用。
执行ELF的R代码可在http://www.meb.ki.se/~yudpaw获取。