Suppr超能文献

反方:用于控制变量选择的反向统计量。

Contra: Contrarian statistics for controlled variable selection.

作者信息

Sudarshan Mukund, Puli Aahlad, Subramanian Lakshmi, Sankararaman Sriram, Ranganath Rajesh

机构信息

Courant Institute, New York University.

Department of Computer Science, University of California, Los Angeles.

出版信息

Proc Mach Learn Res. 2021 Apr;130:1900-1908.

Abstract

The holdout randomization test (HRT) discovers a set of covariates most predictive of a response. Given the covariate distribution, HRTs can explicitly control the false discovery rate (FDR). However, if this distribution is unknown and must be estimated from data, HRTs can inflate the FDR. To alleviate the inflation of FDR, we propose the contrarian randomization test (CONTRA), which is designed explicitly for scenarios where the covariate distribution must be estimated from data and may even be misspecified. Our key insight is to use an equal mixture of two "contrarian" probabilistic models in determining the importance of a covariate. One model is fit with the real data, while the other is fit using the same data, but with the covariate being tested replaced with samples from an estimate of the covariate distribution. CONTRA is flexible enough to achieve a power of 1 asymptotically, can reduce the FDR compared to state-of-the-art CVS methods when the covariate distribution is misspecified, and is computationally efficient in high dimensions and large sample sizes. We further demonstrate the effectiveness of CONTRA on numerous synthetic benchmarks, and highlight its capabilities on a genetic dataset.

摘要

保留随机化检验(HRT)可发现一组对响应最具预测性的协变量。给定协变量分布,HRT 可以明确控制错误发现率(FDR)。然而,如果这种分布未知且必须从数据中估计,HRT 可能会使 FDR 膨胀。为了缓解 FDR 的膨胀,我们提出了反向随机化检验(CONTRA),它专为协变量分布必须从数据中估计甚至可能被错误指定的情况而设计。我们的关键见解是在确定协变量的重要性时使用两个“反向”概率模型的等混合。一个模型用真实数据拟合,而另一个模型使用相同的数据拟合,但将正在测试的协变量替换为来自协变量分布估计的样本。CONTRA 足够灵活,渐近地实现 1 的功效,当协变量分布被错误指定时,与最先进的 CVS 方法相比可以降低 FDR,并且在高维和大样本量情况下计算效率高。我们进一步在众多合成基准上证明了 CONTRA 的有效性,并突出了它在一个遗传数据集上的能力。

相似文献

2
RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs.RANK:基于图形非线性仿样的大规模推断
J Am Stat Assoc. 2020;115(529):362-379. doi: 10.1080/01621459.2018.1546589. Epub 2019 Apr 11.
6
Parsimonious covariate selection with censored outcomes.带有删失结局的简约协变量选择
Biometrics. 2016 Jun;72(2):452-62. doi: 10.1111/biom.12420. Epub 2015 Sep 27.
8
Parsimonious covariate selection for a multicategory ordered response.针对多类别有序响应的简约协变量选择
Stat Methods Med Res. 2017 Dec;26(6):2743-2757. doi: 10.1177/0962280215608120. Epub 2015 Oct 1.

本文引用的文献

3
Chapter 11: Genome-wide association studies.第十一章:全基因组关联研究。
PLoS Comput Biol. 2012;8(12):e1002822. doi: 10.1371/journal.pcbi.1002822. Epub 2012 Dec 27.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验