Fan Jianqing, Han Xu, Gu Weijie
Department of Operations Research & Financial Engineering, Princeton University, Princeton, NJ 08544, USA and honorary professor, School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China.
Department of Statistics, University of Florida, Florida, FL 32606.
J Am Stat Assoc. 2012;107(499):1019-1035. doi: 10.1080/01621459.2012.720478.
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any SNPs are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a novel method based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive an approximate expression for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent estimate of realized FDP. This result has important applications in controlling FDR and FDP. Our estimate of realized FDP compares favorably with Efron (2007)'s approach, as demonstrated in the simulated examples. Our approach is further illustrated by some real data applications. We also propose a dependence-adjusted procedure, which is more powerful than the fixed threshold procedure.
多重假设检验是高维推断中的一个基本问题,在许多科学领域都有广泛应用。在全基因组关联研究中,会同时进行数以万计的检验,以确定是否有任何单核苷酸多态性(SNP)与某些性状相关,并且这些检验是相关的。当检验统计量相关时,在任意相关性下控制错误发现率变得极具挑战性。在本文中,我们提出了一种基于主因子近似的新方法,该方法成功地减去了共同相关性,并显著削弱了相关结构,以处理任意相关结构。当使用共同阈值时,我们推导了大规模多重检验中错误发现比例(FDP)的近似表达式,并提供了实际FDP的一致估计。这一结果在控制错误发现率(FDR)和错误发现比例(FDP)方面具有重要应用。如模拟示例所示,我们对实际FDP的估计优于埃弗龙(2007年)的方法。我们的方法通过一些实际数据应用得到了进一步说明。我们还提出了一种依赖调整程序,它比固定阈值程序更有效。