Suppr超能文献

通过蒸馏实现快速且强大的条件随机化测试。

Fast and powerful conditional randomization testing via distillation.

作者信息

Liu Molei, Katsevich Eugene, Janson Lucas, Ramdas Aaditya

机构信息

Department of Biostatistics, Harvard Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, U.S.A.

Department of Statistics and Data Science, Wharton School of the University of Pennsylvania, 265 South 37th Street, Philadelphia, Pennsylvania 19104, U.S.A.

出版信息

Biometrika. 2022 Jun;109(2):277-293. doi: 10.1093/biomet/asab039. Epub 2021 Jul 8.

Abstract

We consider the problem of conditional independence testing: given a response and covariates , we test the null hypothesis that . The conditional randomization test was recently proposed as a way to use distributional information about to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about . This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test's statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to the most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.

摘要

我们考虑条件独立性检验的问题

给定一个响应变量和协变量,我们检验原假设 。条件随机化检验最近被提出,作为一种利用关于 的分布信息,在不做任何关于 的假设的情况下,使用任意维度下的任何检验统计量来精确且非渐近地控制第一类错误的方法。原则上,这种灵活性允许人们从复杂的预测算法中推导出强大的检验统计量,同时保持统计有效性。然而,在条件随机化检验中直接使用这种先进的检验统计量在计算上成本过高,特别是在多重检验的情况下,因为需要在重采样数据上多次重新计算检验统计量。我们提出了蒸馏条件随机化检验,这是一种在条件随机化检验中使用先进机器学习算法的新方法,同时大幅减少这些算法需要运行的次数,从而在不承担通常计算成本的情况下利用其强大功能和条件随机化检验的统计保证。除了蒸馏,我们还提出了一些其他技巧,如筛选和循环计算,以进一步加快条件随机化检验的速度,同时不牺牲其高功效和精确有效性。事实上,我们在模拟中表明,我们所有的提议相结合会产生一种检验,其功效与现有的最强大的条件随机化检验实现类似,但所需计算量减少了几个数量级,使其即使对于大型数据集也是一个实用工具。我们通过识别与癌症分期相关的生物标志物,在一个乳腺癌数据集上展示了这些优势。

相似文献

4
Familywise error rate control for block response-adaptive randomization.块应答自适应随机化的组内错误率控制。
Stat Methods Med Res. 2023 Jun;32(6):1193-1202. doi: 10.1177/09622802231167437. Epub 2023 Apr 6.
6
Model-free prediction test with application to genomics data.无模型预测检验及其在基因组学数据中的应用。
Proc Natl Acad Sci U S A. 2022 Aug 23;119(34):e2205518119. doi: 10.1073/pnas.2205518119. Epub 2022 Aug 15.

引用本文的文献

6
Double Empirical Bayes Testing.双重经验贝叶斯检验
Int Stat Rev. 2020 Dec;88(Suppl 1):S91-S113. doi: 10.1111/insr.12430. Epub 2020 Nov 25.

本文引用的文献

1
Causal inference in genetic trio studies.遗传三体型研究中的因果推断。
Proc Natl Acad Sci U S A. 2020 Sep 29;117(39):24117-24126. doi: 10.1073/pnas.2007743117. Epub 2020 Sep 18.
7
Gene hunting with hidden Markov model knockoffs.使用隐马尔可夫模型仿样进行基因搜寻。
Biometrika. 2019 Mar;106(1):1-18. doi: 10.1093/biomet/asy033. Epub 2018 Aug 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验