Hu James X, Zhao Hongyu, Zhou Harrison H
Department of Statistics, Yale University, New Haven, CT 06511.
J Am Stat Assoc. 2010 Sep 1;105(491):1215-1227. doi: 10.1198/jasa.2010.tm09329.
In the context of large-scale multiple hypothesis testing, the hypotheses often possess certain group structures based on additional information such as Gene Ontology in gene expression data and phenotypes in genome-wide association studies. It is hence desirable to incorporate such information when dealing with multiplicity problems to increase statistical power. In this article, we demonstrate the benefit of considering group structure by presenting a p-value weighting procedure which utilizes the relative importance of each group while controlling the false discovery rate under weak conditions. The procedure is easy to implement and shown to be more powerful than the classical Benjamini-Hochberg procedure in both theoretical and simulation studies. By estimating the proportion of true null hypotheses, the data-driven procedure controls the false discovery rate asymptotically. Our analysis on one breast cancer dataset confirms that the procedure performs favorably compared with the classical method.
在大规模多重假设检验的背景下,基于诸如基因表达数据中的基因本体论和全基因组关联研究中的表型等附加信息,假设通常具有特定的组结构。因此,在处理多重性问题时纳入此类信息以提高统计功效是很有必要的。在本文中,我们通过提出一种p值加权程序来证明考虑组结构的益处,该程序在弱条件下控制错误发现率的同时利用了每个组的相对重要性。该程序易于实施,并且在理论和模拟研究中均显示出比经典的Benjamini-Hochberg程序更强大。通过估计真零假设的比例,数据驱动的程序渐近地控制错误发现率。我们对一个乳腺癌数据集的分析证实,与经典方法相比,该程序表现良好。