基于欠平滑 LASSO 的倾向评分模型的靶向学习，用于医疗保健数据库研究中的大规模协变量调整。

Targeted learning with an undersmoothed LASSO propensity score model for large-scale covariate adjustment in health-care database studies.

机构信息

Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02120, United States.

Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, CA 94720, United States.

出版信息

Am J Epidemiol. 2024 Nov 4;193(11):1632-1640. doi: 10.1093/aje/kwae023.

DOI:10.1093/aje/kwae023

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11538566/

Abstract

Least absolute shrinkage and selection operator (LASSO) regression is widely used for large-scale propensity score (PS) estimation in health-care database studies. In these settings, previous work has shown that undersmoothing (overfitting) LASSO PS models can improve confounding control, but it can also cause problems of nonoverlap in covariate distributions. It remains unclear how to select the degree of undersmoothing when fitting large-scale LASSO PS models to improve confounding control while avoiding issues that can result from reduced covariate overlap. Here, we used simulations to evaluate the performance of using collaborative-controlled targeted learning to data-adaptively select the degree of undersmoothing when fitting large-scale PS models within both singly and doubly robust frameworks to reduce bias in causal estimators. Simulations showed that collaborative learning can data-adaptively select the degree of undersmoothing to reduce bias in estimated treatment effects. Results further showed that when fitting undersmoothed LASSO PS models, the use of cross-fitting was important for avoiding nonoverlap in covariate distributions and reducing bias in causal estimates.

摘要

最小绝对收缩和选择算子（LASSO）回归广泛用于医疗保健数据库研究中的大规模倾向评分（PS）估计。在这些环境下，之前的工作表明，过度平滑（过度拟合）LASSO PS 模型可以改善混杂控制，但也可能导致协变量分布的非重叠问题。当拟合大规模 LASSO PS 模型以改善混杂控制时，如何选择过度平滑的程度，同时避免因协变量重叠减少而导致的问题，目前仍不清楚。在这里，我们使用模拟来评估在单稳健和双稳健框架内使用协作控制靶向学习来数据自适应地选择过度平滑程度的性能，以减少因果估计偏差。模拟结果表明，协作学习可以数据自适应地选择过度平滑的程度，以减少估计治疗效果的偏差。结果还表明，当拟合过度平滑的 LASSO PS 模型时，使用交叉拟合对于避免协变量分布的非重叠和减少因果估计的偏差很重要。

相似文献

1

Targeted learning with an undersmoothed LASSO propensity score model for large-scale covariate adjustment in health-care database studies.基于欠平滑 LASSO 的倾向评分模型的靶向学习，用于医疗保健数据库研究中的大规模协变量调整。

Am J Epidemiol. 2024 Nov 4;193(11):1632-1640. doi: 10.1093/aje/kwae023.

2

Prognostic score-based model averaging approach for propensity score estimation.基于预后评分的模型平均倾向评分估计方法。

BMC Med Res Methodol. 2024 Oct 3;24(1):228. doi: 10.1186/s12874-024-02350-y.

3

Note on targeted learning with an undersmoothed Lasso propensity score model for large-scale covariate adjustment in health care database studies.关于在医疗保健数据库研究中使用欠平滑套索倾向评分模型进行大规模协变量调整的靶向学习的说明。

Am J Epidemiol. 2025 May 7;194(5):1470-1472. doi: 10.1093/aje/kwaf024.

4

Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data.基于协作控制 LASSO 的高维数据倾向评分匹配估计量的构建

Stat Methods Med Res. 2019 Apr;28(4):1044-1063. doi: 10.1177/0962280217744588. Epub 2017 Dec 11.

5

A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases.利用大型医疗保健数据库估计因果效应的混杂因素选择和调整方法比较。

Pharmacoepidemiol Drug Saf. 2022 Apr;31(4):424-433. doi: 10.1002/pds.5403. Epub 2022 Jan 7.

6

Using Super Learner Prediction Modeling to Improve High-dimensional Propensity Score Estimation.运用超级学习者预测模型提高高维倾向评分估计的效果。

Epidemiology. 2018 Jan;29(1):96-106. doi: 10.1097/EDE.0000000000000762.

7

8

Outcome-adaptive lasso: Variable selection for causal inference.结果自适应套索：用于因果推断的变量选择

Biometrics. 2017 Dec;73(4):1111-1122. doi: 10.1111/biom.12679. Epub 2017 Mar 8.

9

How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High-Dimensional Proxies to Reduce Residual Confounding?在纳入高维代理变量以减少残余混杂方面，机器学习和双重稳健估计器的效果如何？

Pharmacoepidemiol Drug Saf. 2025 May;34(5):e70155. doi: 10.1002/pds.70155.

10

The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bCART, and the covariate-balancing propensity score.预测建模在倾向评分估计中的作用：逻辑回归、bCART 和协变量平衡倾向评分的评估。

Am J Epidemiol. 2014 Sep 15;180(6):645-55. doi: 10.1093/aje/kwu181. Epub 2014 Aug 20.

引用本文的文献

1

Individualized Prediction of Postoperative Survival in Gallbladder Cancer: A Nomogram Based on SEER Data and External Validation.胆囊癌术后生存的个体化预测：基于监测、流行病学与最终结果（SEER）数据的列线图及外部验证

Cancers (Basel). 2025 Jun 9;17(12):1919. doi: 10.3390/cancers17121919.

2

Commentary on ``Nonparametric identification is not enough, but randomized controlled trials are'': Statistical considerations for generating reliable evidence across a spectrum of studies that increasingly involve real-world elements.对《非参数识别不够，但随机对照试验足够》的评论：在越来越多地涉及现实世界因素的一系列研究中生成可靠证据的统计考量。

Obs Stud. 2025 Apr 11;11(1):61-76. doi: 10.1353/obs.2025.a956842. eCollection 2025.

本文引用的文献

1

Nonparametric inverse-probability-weighted estimators based on the highly adaptive lasso.基于高度自适应 lasso 的非参数逆概率加权估计量。

Biometrics. 2023 Jun;79(2):1029-1041. doi: 10.1111/biom.13719. Epub 2022 Jul 27.

2

Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse-Probability-Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects.数据自适应选择倾向评分截断水平对边际点处理效应的逆概率加权和有偏极大似然估计。

Am J Epidemiol. 2022 Aug 22;191(9):1640-1651. doi: 10.1093/aje/kwac087.

3

Normalized Augmented Inverse Probability Weighting with Neural Network Predictions.带有神经网络预测的标准化增强逆概率加权法

Entropy (Basel). 2022 Jan 25;24(2):179. doi: 10.3390/e24020179.

4

Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.使用机器学习算法获取有效因果效应估计值面临的挑战。

Am J Epidemiol. 2023 Sep 1;192(9). doi: 10.1093/aje/kwab201. Epub 2021 Jul 15.

5

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用：基于交叉拟合估计量的研究。

Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.

6

Principles of confounder selection.混杂因素选择原则。

Eur J Epidemiol. 2019 Mar;34(3):211-219. doi: 10.1007/s10654-019-00494-6. Epub 2019 Mar 6.

7

Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness.用于评估药物安全性和有效性的真实世界证据与真实世界数据。

JAMA. 2018 Sep 4;320(9):867-868. doi: 10.1001/jama.2018.10136.

8

Improving reproducibility by using high-throughput observational studies with empirical calibration.通过使用经实证校准的高通量观察性研究提高可重复性。

Philos Trans A Math Phys Eng Sci. 2018 Sep 13;376(2128). doi: 10.1098/rsta.2017.0356.

9

Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects.用于研究因果治疗效果的电子医疗数据的自动化数据自适应分析。

Clin Epidemiol. 2018 Jul 6;10:771-788. doi: 10.2147/CLEP.S166545. eCollection 2018.

10

Evaluating large-scale propensity score performance through real-world and synthetic data experiments.通过真实数据和合成数据实验评估大规模倾向评分性能。

Int J Epidemiol. 2018 Dec 1;47(6):2005-2014. doi: 10.1093/ije/dyy120.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验