Li Shuying S, Bigler Jeannette, Lampe Johanna W, Potter John D, Feng Ziding
Cancer Prevention Research Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109-1024, USA.
Stat Med. 2005 Aug 15;24(15):2267-80. doi: 10.1002/sim.2119.
Microarrays are used increasingly to identify genes that are truly differentially expressed in tissues under different conditions. Planning such studies requires establishing a sample size that will ensure adequate statistical power. For microarray analyses, false discovery rate (FDR) is considered to be an appropriate error measure. Several FDR-controlling procedures have been developed. How these procedures perform for such analyses has not been evaluated thoroughly under realistic assumptions. In order to develop a method of determining sample sizes for these procedures, it needs to be established whether these procedures really control the FDR below the pre-specified level so that the determined sample size indeed provides adequate power. To answer this question, we first conducted simulation studies. Our simulation results showed that these procedures do control the FDR at most situations but under-control the FDR when the proportion of positive genes is small, the most likely scenarios. Thus, these existing procedures can overestimate the power and underestimate the sample size. Accordingly, we developed a simulation-based method to provide more accurate estimates for power and sample size.
微阵列越来越多地用于识别在不同条件下组织中真正差异表达的基因。规划此类研究需要确定一个样本量,以确保有足够的统计效力。对于微阵列分析,错误发现率(FDR)被认为是一种合适的误差度量。已经开发了几种控制FDR的程序。在现实假设下,这些程序在此类分析中的表现尚未得到充分评估。为了开发一种确定这些程序样本量的方法,需要确定这些程序是否真的能将FDR控制在预先指定的水平以下,以便确定的样本量确实提供足够的效力。为了回答这个问题,我们首先进行了模拟研究。我们的模拟结果表明,这些程序在大多数情况下确实能控制FDR,但在阳性基因比例较小(最可能的情况)时会控制不足。因此,这些现有程序可能会高估效力并低估样本量。相应地,我们开发了一种基于模拟的方法,以提供更准确的效力和样本量估计。