机器学习在因果推断中的应用：基于交叉拟合估计量的研究。

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.

机构信息

From the Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC.

Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC.

出版信息

Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.

DOI:10.1097/EDE.0000000000001332

PMID:33591058

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8012235/

Abstract

BACKGROUND

Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in complications for inference. Doubly robust cross-fit estimators have been proposed to yield better statistical properties.

METHODS

We conducted a simulation study to assess the performance of several different estimators for the average causal effect. The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly robust estimators (g-computation, inverse probability weighting) and doubly robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). We estimated nuisance functions with parametric models and ensemble machine learning separately. We further assessed doubly robust cross-fit estimators.

RESULTS

With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the doubly robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.

CONCLUSIONS

Due to the difficulty of properly specifying parametric models in high-dimensional data, doubly robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the average causal effect in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.

摘要

背景

现代因果推理方法允许机器学习被用于削弱参数建模假设。然而，机器学习的使用可能会给推理带来复杂性。双重稳健交叉拟合估计器已被提出以获得更好的统计性质。

方法

我们进行了一项模拟研究，以评估几种不同的平均因果效应估计器的性能。模拟的处理和结果的生成机制包括对数转换、多项式项和不连续性。我们比较了单一稳健估计器（g 计算、逆概率加权）和双重稳健估计器（增强逆概率加权、靶向最大似然估计）。我们分别使用参数模型和集成机器学习来估计混杂函数。我们进一步评估了双重稳健交叉拟合估计器。

结果

在正确指定参数模型的情况下，所有估计器都是无偏的，置信区间达到了名义覆盖范围。当与机器学习一起使用时，双重稳健交叉拟合估计器在偏差、方差和置信区间覆盖方面都大大优于所有其他估计器。

结论

由于在高维数据中正确指定参数模型的困难，具有集成学习和交叉拟合的双重稳健估计器可能是大多数流行病学研究中估计平均因果效应的首选方法。然而，这些方法可能需要更大的样本量来避免有限样本问题。

相似文献

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.

Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.

Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.

Am J Epidemiol. 2023 Sep 1;192(9). doi: 10.1093/aje/kwab201. Epub 2021 Jul 15.

AIPW: An R Package for Augmented Inverse Probability-Weighted Estimation of Average Causal Effects.

Am J Epidemiol. 2021 Dec 1;190(12):2690-2699. doi: 10.1093/aje/kwab207.

Machine learning outcome regression improves doubly robust estimation of average causal effects.

Pharmacoepidemiol Drug Saf. 2020 Sep;29(9):1120-1133. doi: 10.1002/pds.5074. Epub 2020 Jul 27.

Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study.

Int J Biostat. 2019 Feb 26;15(2):/j/ijb.2019.15.issue-2/ijb-2017-0054/ijb-2017-0054.xml. doi: 10.1515/ijb-2017-0054.

Understanding and diagnosing the potential for bias when using machine learning methods with doubly robust causal estimators.

Stat Methods Med Res. 2019 Jun;28(6):1637-1650. doi: 10.1177/0962280218772065. Epub 2018 May 2.

Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies.

Am J Epidemiol. 2017 Jan 1;185(1):65-73. doi: 10.1093/aje/kww165. Epub 2016 Dec 9.

Collaborative double robust targeted maximum likelihood estimation.

Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

Doubly robust inference for targeted minimum loss-based estimation in randomized trials with missing outcome data.

Stat Med. 2017 Oct 30;36(24):3807-3819. doi: 10.1002/sim.7389. Epub 2017 Jul 25.

Data-Adaptive Bias-Reduced Doubly Robust Estimation.

Int J Biostat. 2016 May 1;12(1):253-82. doi: 10.1515/ijb-2015-0029.

引用本文的文献

Mechanisms of Change in Exposure Therapy for Anxiety and Related Disorders: A Research Agenda.

Clin Psychol Sci. 2025 Jul;13(4):687-719. doi: 10.1177/21677026241240727. Epub 2024 May 25.

Enhanced doubly robust estimation with concave link functions for estimands in clinical trials.

J Nonparametr Stat. 2024 Mar 12. doi: 10.1080/10485252.2024.2328078.

Using Machine Learning to Improve Control for Confounding in the Dynamic Weighted Ordinary Least Squares Estimator of Optimal Adaptive Treatment Strategies.

Biom J. 2025 Aug;67(4):e70068. doi: 10.1002/bimj.70068.

Performance of Cross-Validated Targeted Maximum Likelihood Estimation.

Stat Med. 2025 Jul;44(15-17):e70185. doi: 10.1002/sim.70185.

High-Dimensional Disease Risk Score for Dealing With Residual Confounding Bias in Estimating Treatment Effects With a Survival Outcome.

Pharmacoepidemiol Drug Saf. 2025 Jul;34(7):e70172. doi: 10.1002/pds.70172.

Evaluation of Machine Learning-Based Propensity Score Estimation: A Benchmarking Observational Analysis Against a Randomized Trial.

medRxiv. 2025 Jun 17:2025.06.16.25329708. doi: 10.1101/2025.06.16.25329708.

How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High-Dimensional Proxies to Reduce Residual Confounding?

Pharmacoepidemiol Drug Saf. 2025 May;34(5):e70155. doi: 10.1002/pds.70155.

Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes.

Stat Med. 2025 Mar 15;44(6):e70034. doi: 10.1002/sim.70034.

Do we need flexible machine-learning algorithms to assess the effect of long-term exposure to fine particulate matter on mortality?: An example from a Canadian national cohort.

Environ Epidemiol. 2025 Mar 4;9(2):e375. doi: 10.1097/EE9.0000000000000375. eCollection 2025 Apr.

Natural language processing for scalable feature engineering and ultra-high-dimensional confounding adjustment in healthcare database studies.

medRxiv. 2025 Jan 31:2025.01.30.25321403. doi: 10.1101/2025.01.30.25321403.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

机器学习在因果推断中的应用：基于交叉拟合估计量的研究。

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.

机构信息

From the Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC.

Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC.

出版信息

Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.

DOI:10.1097/EDE.0000000000001332

PMID:33591058

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8012235/

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

摘要

机器学习在因果推断中的应用：基于交叉拟合估计量的研究。

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

机器学习在因果推断中的应用：基于交叉拟合估计量的研究。

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论