• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

反方:用于控制变量选择的反向统计量。

Contra: Contrarian statistics for controlled variable selection.

作者信息

Sudarshan Mukund, Puli Aahlad, Subramanian Lakshmi, Sankararaman Sriram, Ranganath Rajesh

机构信息

Courant Institute, New York University.

Department of Computer Science, University of California, Los Angeles.

出版信息

Proc Mach Learn Res. 2021 Apr;130:1900-1908.

PMID:34522887
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8436172/
Abstract

The holdout randomization test (HRT) discovers a set of covariates most predictive of a response. Given the covariate distribution, HRTs can explicitly control the false discovery rate (FDR). However, if this distribution is unknown and must be estimated from data, HRTs can inflate the FDR. To alleviate the inflation of FDR, we propose the contrarian randomization test (CONTRA), which is designed explicitly for scenarios where the covariate distribution must be estimated from data and may even be misspecified. Our key insight is to use an equal mixture of two "contrarian" probabilistic models in determining the importance of a covariate. One model is fit with the real data, while the other is fit using the same data, but with the covariate being tested replaced with samples from an estimate of the covariate distribution. CONTRA is flexible enough to achieve a power of 1 asymptotically, can reduce the FDR compared to state-of-the-art CVS methods when the covariate distribution is misspecified, and is computationally efficient in high dimensions and large sample sizes. We further demonstrate the effectiveness of CONTRA on numerous synthetic benchmarks, and highlight its capabilities on a genetic dataset.

摘要

保留随机化检验(HRT)可发现一组对响应最具预测性的协变量。给定协变量分布,HRT 可以明确控制错误发现率(FDR)。然而,如果这种分布未知且必须从数据中估计,HRT 可能会使 FDR 膨胀。为了缓解 FDR 的膨胀,我们提出了反向随机化检验(CONTRA),它专为协变量分布必须从数据中估计甚至可能被错误指定的情况而设计。我们的关键见解是在确定协变量的重要性时使用两个“反向”概率模型的等混合。一个模型用真实数据拟合,而另一个模型使用相同的数据拟合,但将正在测试的协变量替换为来自协变量分布估计的样本。CONTRA 足够灵活,渐近地实现 1 的功效,当协变量分布被错误指定时,与最先进的 CVS 方法相比可以降低 FDR,并且在高维和大样本量情况下计算效率高。我们进一步在众多合成基准上证明了 CONTRA 的有效性,并突出了它在一个遗传数据集上的能力。

相似文献

1
Contra: Contrarian statistics for controlled variable selection.反方:用于控制变量选择的反向统计量。
Proc Mach Learn Res. 2021 Apr;130:1900-1908.
2
RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs.RANK:基于图形非线性仿样的大规模推断
J Am Stat Assoc. 2020;115(529):362-379. doi: 10.1080/01621459.2018.1546589. Epub 2019 Apr 11.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Power, selection bias and predictive performance of the Population Pharmacokinetic Covariate Model.群体药代动力学协变量模型的效能、选择偏倚及预测性能
J Pharmacokinet Pharmacodyn. 2004 Apr;31(2):109-34. doi: 10.1023/b:jopa.0000034404.86036.72.
5
Covariate-modulated local false discovery rate for genome-wide association studies.基于协变量的全基因组关联研究的局部假发现率。
Bioinformatics. 2014 Aug 1;30(15):2098-104. doi: 10.1093/bioinformatics/btu145. Epub 2014 Apr 7.
6
Parsimonious covariate selection with censored outcomes.带有删失结局的简约协变量选择
Biometrics. 2016 Jun;72(2):452-62. doi: 10.1111/biom.12420. Epub 2015 Sep 27.
7
High-dimensional variable selection for ordinal outcomes with error control.具有误差控制的有序结局的高维变量选择。
Brief Bioinform. 2021 Jan 18;22(1):334-345. doi: 10.1093/bib/bbaa007.
8
Parsimonious covariate selection for a multicategory ordered response.针对多类别有序响应的简约协变量选择
Stat Methods Med Res. 2017 Dec;26(6):2743-2757. doi: 10.1177/0962280215608120. Epub 2015 Oct 1.
9
The lasso--a novel method for predictive covariate model building in nonlinear mixed effects models.套索法——一种用于非线性混合效应模型中预测协变量模型构建的新方法。
J Pharmacokinet Pharmacodyn. 2007 Aug;34(4):485-517. doi: 10.1007/s10928-007-9057-1. Epub 2007 May 22.
10
Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing.利用生物学和统计学协变量可提高表观基因组关联测试中的检测能力。
Genome Biol. 2020 Apr 6;21(1):88. doi: 10.1186/s13059-020-02001-7.

引用本文的文献

1
DIET: Conditional independence testing with marginal dependence measures of residual information.饮食:基于残余信息边际依赖度量的条件独立性检验
Proc Mach Learn Res. 2023 Apr;206:10343-10367.

本文引用的文献

1
Fast and powerful conditional randomization testing via distillation.通过蒸馏实现快速且强大的条件随机化测试。
Biometrika. 2022 Jun;109(2):277-293. doi: 10.1093/biomet/asab039. Epub 2021 Jul 8.
2
SNPrune: an efficient algorithm to prune large SNP array and sequence datasets based on high linkage disequilibrium.SNPrune:一种基于高度连锁不平衡的高效算法,用于修剪大型 SNP 数组和序列数据集。
Genet Sel Evol. 2018 Jun 26;50(1):34. doi: 10.1186/s12711-018-0404-z.
3
Chapter 11: Genome-wide association studies.第十一章:全基因组关联研究。
PLoS Comput Biol. 2012;8(12):e1002822. doi: 10.1371/journal.pcbi.1002822. Epub 2012 Dec 27.
4
ROCS: receiver operating characteristic surface for class-skewed high-throughput data.ROCS:针对类别倾斜的高通量数据的接收者操作特征曲面。
PLoS One. 2012;7(7):e40598. doi: 10.1371/journal.pone.0040598. Epub 2012 Jul 6.
5
Multiple common variants for celiac disease influencing immune gene expression.多种常见的乳糜泻易感基因变异影响免疫基因表达。
Nat Genet. 2010 Apr;42(4):295-302. doi: 10.1038/ng.543. Epub 2010 Feb 28.
6
Association study of IL2/IL21 and FcgRIIa: significant association with the IL2/IL21 region in Scandinavian coeliac disease families.白细胞介素2/白细胞介素21与FcγRIIa的关联研究:在斯堪的纳维亚乳糜泻家族中与白细胞介素2/白细胞介素21区域存在显著关联。
Genes Immun. 2008 Jun;9(4):364-7. doi: 10.1038/gene.2008.27. Epub 2008 Apr 17.
7
Newly identified genetic risk variants for celiac disease related to the immune response.新发现的与免疫反应相关的乳糜泻基因风险变异体。
Nat Genet. 2008 Apr;40(4):395-402. doi: 10.1038/ng.102. Epub 2008 Mar 2.
8
Principal components analysis corrects for stratification in genome-wide association studies.主成分分析可校正全基因组关联研究中的分层现象。
Nat Genet. 2006 Aug;38(8):904-9. doi: 10.1038/ng1847. Epub 2006 Jul 23.
9
Coeliac disease: dissecting a complex inflammatory disorder.乳糜泻:剖析一种复杂的炎症性疾病。
Nat Rev Immunol. 2002 Sep;2(9):647-55. doi: 10.1038/nri885.