Suppr超能文献

一种通过样本分割提高多重检验程序功效的方法。

A method to increase the power of multiple testing procedures through sample splitting.

作者信息

Rubin Daniel, Dudoit Sandrine, van der Laan Mark

机构信息

Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA.

出版信息

Stat Appl Genet Mol Biol. 2006;5:Article19. doi: 10.2202/1544-6115.1148. Epub 2006 Aug 1.

Abstract

Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the most powerful method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used in a practical setting. We instead consider splitting the sample so that the optimal cutoffs are estimated from a portion of the data, and then testing on the remaining data using these estimated cutoffs. When the null distributions for all test statistics are the same, the obvious way to control the expected number of false positives would be to use a common cutoff for all tests. In this work, we consider the common cutoff method as a benchmark multiple testing procedure. We show that in certain circumstances the use of estimated optimal cutoffs via sample splitting can dramatically outperform this benchmark method, resulting in increased true discoveries, while retaining Type-I error control. This paper is an updated version of the work presented in Rubin et al. (2005), later expanded upon by Wasserman and Roeder (2006).

摘要

考虑标准的多重检验问题,即要检验多个假设,每个假设都与一个检验统计量相关联,并且大的检验统计量提供了反对原假设的证据。一种提供对I型错误进行概率控制的方法是使用确保误报预期数量不超过用户提供阈值的程序。在这类多重检验程序中,我们推导了最强大的方法,即能使真阳性预期数量最大化的检验统计量临界值。不幸的是,这些最优临界值依赖于未知的真实数据生成分布,所以在实际情况中永远无法使用。我们转而考虑将样本拆分,以便从一部分数据中估计最优临界值,然后使用这些估计的临界值对其余数据进行检验。当所有检验统计量的原分布相同时,控制误报预期数量的明显方法是对所有检验使用一个共同的临界值。在这项工作中,我们将共同临界值方法视为一种基准多重检验程序。我们表明,在某些情况下,通过样本拆分使用估计的最优临界值可以显著优于这种基准方法,从而增加真发现数量,同时保持对I型错误的控制。本文是鲁宾等人(2005年)所展示工作的更新版本,后来瓦瑟曼和罗德(2006年)对其进行了扩展。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验