饮食：基于残余信息边际依赖度量的条件独立性检验

DIET: Conditional independence testing with marginal dependence measures of residual information.

作者信息

Sudarshan Mukund, Puli Aahlad, Tansey Wesley, Ranganath Rajesh

机构信息

Computer Science, New York University.

Computational Oncology Memorial Sloan Kettering Cancer Center.

出版信息

Proc Mach Learn Res. 2023 Apr;206:10343-10367.

PMID:37681192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10484293/

Abstract

Conditional randomization tests (CRTs) assess whether a variable is predictive of another variable , having observed covariates . CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which lead to a loss in power. We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues by leveraging marginal independence statistics to test conditional independence relationships. DIET tests the marginal independence of two random variables: and where is a conditional cumulative distribution function (CDF) for the distribution . These variables are termed "information residuals." We give sufficient conditions for DIET to achieve finite sample type-1 error control and power greater than the type-1 error rate. We then prove that when using the mutual information between the information residuals as a test statistic, DIET yields the most powerful conditionally valid test. Finally, we show DIET achieves higher power than other tractable CRTs on several synthetic and real benchmarks.

摘要

条件随机化检验（CRTs）在观测到协变量的情况下，评估一个变量是否能预测另一个变量。CRTs需要拟合大量的预测模型，这在计算上通常难以处理。现有的降低CRTs成本的解决方案通常将数据集拆分为训练集和测试集，或者依赖启发式方法处理交互作用，这两种方法都会导致检验效能的损失。我们提出了解耦独立性检验（DIET），这是一种算法，通过利用边际独立性统计量来检验条件独立性关系，从而避免了上述两个问题。DIET检验两个随机变量的边际独立性：和，其中是分布的条件累积分布函数（CDF）。这些变量被称为“信息残差”。我们给出了DIET实现有限样本类型1错误控制和检验效能大于类型1错误率的充分条件。然后我们证明，当使用信息残差之间的互信息作为检验统计量时，DIET产生最有效的条件有效检验。最后，我们表明DIET在几个合成和真实基准上比其他可处理的CRTs具有更高的检验效能。

相似文献

DIET: Conditional independence testing with marginal dependence measures of residual information.饮食：基于残余信息边际依赖度量的条件独立性检验

Proc Mach Learn Res. 2023 Apr;206:10343-10367.

Fast and powerful conditional randomization testing via distillation.通过蒸馏实现快速且强大的条件随机化测试。

Biometrika. 2022 Jun;109(2):277-293. doi: 10.1093/biomet/asab039. Epub 2021 Jul 8.

Test of Association Between Two Ordinal Variables While Adjusting for Covariates.在调整协变量的情况下对两个有序变量之间的关联性进行检验。

J Am Stat Assoc. 2010 Jun 1;105(490):612-620. doi: 10.1198/jasa.2010.tm09386.

Summary statistics knockoffs inference with family-wise error rate control.基于 FWER 控制的摘要统计量置换检验推断。

Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae082.

Learning dependence from samples.从样本中学习依赖性。

Int J Bioinform Res Appl. 2014;10(1):43-58. doi: 10.1504/IJBRA.2014.058777.

CONDITIONAL DISTANCE CORRELATION.条件距离相关性

J Am Stat Assoc. 2015;110(512):1726-1734. doi: 10.1080/01621459.2014.993081. Epub 2015 Jan 23.

Extending Hilbert-Schmidt Independence Criterion for Testing Conditional Independence.扩展用于检验条件独立性的希尔伯特-施密特独立性准则。

Entropy (Basel). 2023 Feb 26;25(3):425. doi: 10.3390/e25030425.

Analysis of residuals in contingency tables: another nail in the coffin of conditional approaches to significance testing.列联表残差分析：对条件显著性检验方法的又一冲击。

Behav Res Methods. 2015 Mar;47(1):147-61. doi: 10.3758/s13428-014-0472-0.

A Conditional Mutual Information Estimator for Mixed Data and an Associated Conditional Independence Test.一种用于混合数据的条件互信息估计器及相关的条件独立性检验。

Entropy (Basel). 2022 Sep 2;24(9):1234. doi: 10.3390/e24091234.

A general framework for powerful confounder adjustment in omics association studies.一种用于组学关联研究中强大混杂因素调整的通用框架。

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad563.

引用本文的文献

Robustness to Spurious Correlations Improves Semantic Out-of-Distribution Detection.对虚假相关性的鲁棒性可改善语义分布外检测。

Proc AAAI Conf Artif Intell. 2023 Jun 27;37(12):15305-15312. doi: 10.1609/aaai.v37i12.26785.

本文引用的文献

Fast and powerful conditional randomization testing via distillation.通过蒸馏实现快速且强大的条件随机化测试。

Biometrika. 2022 Jun;109(2):277-293. doi: 10.1093/biomet/asab039. Epub 2021 Jul 8.

Contra: Contrarian statistics for controlled variable selection.反方：用于控制变量选择的反向统计量。

Proc Mach Learn Res. 2021 Apr;130:1900-1908.

General Control Functions for Causal Effect Estimation from Instrumental Variables.基于工具变量进行因果效应估计的一般控制函数。

Adv Neural Inf Process Syst. 2020 Dec;33:8440-8451.

Deep direct likelihood knockoffs.深度直接似然性仿样

Adv Neural Inf Process Syst. 2020 Dec;33:5036-5046.

IPAD: Stable Interpretable Forecasting with Knockoffs Inference.IPAD：基于仿冒品推断的稳定可解释预测

J Am Stat Assoc. 2020;115(532):1822-1834. doi: 10.1080/01621459.2019.1654878. Epub 2019 Sep 17.

Mean platelet volume/platelet count ratio predicts severe pneumonia of COVID-19.血小板平均体积/血小板计数比值预测 COVID-19 重症肺炎。

J Clin Lab Anal. 2021 Jan;35(1):e23607. doi: 10.1002/jcla.23607. Epub 2020 Oct 31.

Laboratory Findings Associated With Severe Illness and Mortality Among Hospitalized Individuals With Coronavirus Disease 2019 in Eastern Massachusetts.马萨诸塞州东部住院的 2019 年冠状病毒病患者中严重疾病和死亡相关的实验室检查结果。

JAMA Netw Open. 2020 Oct 1;3(10):e2023934. doi: 10.1001/jamanetworkopen.2020.23934.

Risk stratification of hospitalized COVID-19 patients through comparative studies of laboratory results with influenza.通过将实验室结果与流感进行对比研究对住院的COVID-19患者进行风险分层。

EClinicalMedicine. 2020 Jul 31;26:100475. doi: 10.1016/j.eclinm.2020.100475. eCollection 2020 Sep.

A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients.一种经过验证的用于预测住院COVID-19患者良好预后的实时预测模型。

NPJ Digit Med. 2020 Oct 6;3:130. doi: 10.1038/s41746-020-00343-x. eCollection 2020.

Causal inference in genetic trio studies.遗传三体型研究中的因果推断。

Proc Natl Acad Sci U S A. 2020 Sep 29;117(39):24117-24126. doi: 10.1073/pnas.2007743117. Epub 2020 Sep 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验