Pawitan Yudi, Michiels Stefan, Koscielny Serge, Gusnanto Arief, Ploner Alexander
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet 17177 Stockholm, Sweden.
Bioinformatics. 2005 Jul 1;21(13):3017-24. doi: 10.1093/bioinformatics/bti448. Epub 2005 Apr 19.
In microarray data studies most researchers are keenly aware of the potentially high rate of false positives and the need to control it. One key statistical shift is the move away from the well-known P-value to false discovery rate (FDR). Less discussion perhaps has been spent on the sensitivity or the associated false negative rate (FNR). The purpose of this paper is to explain in simple ways why the shift from P-value to FDR for statistical assessment of microarray data is necessary, to elucidate the determining factors of FDR and, for a two-sample comparative study, to discuss its control via sample size at the design stage.
We use a mixture model, involving differentially expressed (DE) and non-DE genes, that captures the most common problem of finding DE genes. Factors determining FDR are (1) the proportion of truly differentially expressed genes, (2) the distribution of the true differences, (3) measurement variability and (4) sample size. Many current small microarray studies are plagued with large FDR, but controlling FDR alone can lead to unacceptably large FNR. In evaluating a design of a microarray study, sensitivity or FNR curves should be computed routinely together with FDR curves. Under certain assumptions, the FDR and FNR curves coincide, thus simplifying the choice of sample size for controlling the FDR and FNR jointly.
在微阵列数据研究中,大多数研究人员敏锐地意识到可能存在的高假阳性率以及控制它的必要性。一个关键的统计转变是从广为人知的P值转向错误发现率(FDR)。或许对于敏感性或相关的假阴性率(FNR)的讨论较少。本文的目的是以简单的方式解释为什么在微阵列数据的统计评估中从P值转向FDR是必要的,阐明FDR的决定因素,并针对双样本比较研究,讨论在设计阶段通过样本量对其进行控制。
我们使用一个混合模型,该模型涉及差异表达(DE)基因和非DE基因,它捕捉了寻找DE基因时最常见的问题。决定FDR的因素有:(1)真正差异表达基因的比例;(2)真实差异的分布;(3)测量变异性;(4)样本量。当前许多小型微阵列研究都受到高FDR的困扰,但仅控制FDR可能会导致不可接受的高FNR。在评估微阵列研究的设计时,应常规地将敏感性或FNR曲线与FDR曲线一起计算。在某些假设下,FDR和FNR曲线重合,从而简化了同时控制FDR和FNR时样本量的选择。