van der Laan Mark J, Birkner Merrill D, Hubbard Alan E
Division of Biostatistics, School of Public Health, University of California, Berkeley, USA.
Stat Appl Genet Mol Biol. 2005;4:Article29. doi: 10.2202/1544-6115.1143. Epub 2005 Oct 7.
Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new re-sampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of null hypotheses, and 2) specifying a generally valid null distribution for the vector of test-statistics proposed in Pollard & van der Laan (2003), and generalized in our subsequent article Dudoit, van der Laan, & Pollard (2004), van der Laan, Dudoit, & Pollard (2004), and van der Laan, Dudoit, & Pollard (2004b). Ingredient 1) is established by fitting the empirical Bayes two component mixture model (Efron (2001b)) to the data to obtain an upper bound for marginal posterior probabilities of the null being true, given the data. We establish the finite sample rational behind our proposal, and prove that this new multiple testing procedure asymptotically controls the wished tail probability for the proportion of false positives under general data generating distributions. In addition, we provide simulation studies establishing that this method is generally more powerful in finite samples than our previously proposed augmentation multiple testing procedure (van der Laan, Dudoit, & Pollard (2004b)) and competing procedures from the literature. Finally, we illustrate our methodology with a data analysis.
基于独立同分布观测样本,同时检验关于数据生成分布的一组原假设,是一个涉及许多应用的基本且重要的统计问题。在本文中,我们提出了一种基于重采样的新多重检验程序,该程序能在渐近意义下控制拒绝集中假阳性比例超过q的概率,其水平为α,其中q和α是用户给定的数值。该程序包括:1)给定数据,为一组猜测的真原假设指定一个条件分布,在渐近意义下,该分布在真原假设集处退化;2)为Pollard和van der Laan(2003)中提出并在我们后续文章Dudoit、van der Laan和Pollard(2004)、van der Laan、Dudoit和Pollard(2004)以及van der Laan、Dudoit和Pollard(2004b)中推广的检验统计量向量指定一个普遍有效的原分布。成分1)是通过将经验贝叶斯双成分混合模型(Efron(2001b))拟合到数据,以获得给定数据时原假设为真的边际后验概率的上界来建立的。我们确立了我们提议背后的有限样本合理性,并证明了这种新的多重检验程序在一般数据生成分布下渐近地控制了假阳性比例的期望尾概率。此外,我们提供了模拟研究,证明该方法在有限样本中通常比我们之前提出的增强多重检验程序(van der Laan、Dudoit和Pollard(2004b))以及文献中的竞争程序更具功效。最后,我们用一个数据分析来说明我们的方法。