大型观察性数据集中用于边际结构建模的逆概率权重集成学习

Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets.

作者信息

Gruber Susan, Logan Roger W, Jarrín Inmaculada, Monge Susana, Hernán Miguel A

机构信息

Department of Epidemiology, Harvard School of Public Health, Boston, MA, U.S.A.

出版信息

Stat Med. 2015 Jan 15;34(1):106-17. doi: 10.1002/sim.6322. Epub 2014 Oct 15.

DOI:10.1002/sim.6322

PMID:25316152

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4262745/

Abstract

Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results.

摘要

用于拟合边际结构模型的逆概率权重通常使用逻辑回归进行估计。然而，一种数据自适应程序可能能够更好地利用测量协变量中可用的信息。通过组合来自多种算法的预测，集成学习提供了一种替代逻辑回归建模的方法，以进一步减少估计的边际结构模型参数中的偏差。我们描述了两种集成学习方法在估计稳定权重方面的应用：超级学习（SL），一种依赖于V折交叉验证的集成机器学习方法，以及一种将数据划分为训练集和验证集的单一分区的集成学习器（EL）。分析了来自西班牙两项多中心队列研究（CoRIS和CoRIS-MD）的纵向数据，以估计HIV阳性受试者中开始联合抗逆转录病毒治疗与未开始治疗的死亡风险比。与逻辑回归建模相比，两种集成方法产生的风险比估计值都更远离零假设，且置信区间更窄。EL的计算时间不到SL的一半。我们得出结论，在拟合边际结构模型时，使用各种候选算法库的集成学习为逆概率权重的参数建模提供了一种替代方法。对于大型数据集，EL在比SL更短的时间内对解空间进行了丰富的搜索，且结果相当。

相似文献

Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets.大型观察性数据集中用于边际结构建模的逆概率权重集成学习

Stat Med. 2015 Jan 15;34(1):106-17. doi: 10.1002/sim.6322. Epub 2014 Oct 15.

Estimating inverse probability weights using super learner when weight-model specification is unknown in a marginal structural Cox model context.在边际结构Cox模型背景下，当权重模型规范未知时，使用超级学习器估计逆概率权重。

Stat Med. 2017 Jun 15;36(13):2032-2047. doi: 10.1002/sim.7266. Epub 2017 Feb 20.

Super learning to hedge against incorrect inference from arbitrary parametric assumptions in marginal structural modeling.通过超级学习来防范边缘结构建模中任意参数假设导致的错误推断。

J Clin Epidemiol. 2013 Aug;66(8 Suppl):S99-109. doi: 10.1016/j.jclinepi.2013.01.016.

Using marginal structural measurement-error models to estimate the long-term effect of antiretroviral therapy on incident AIDS or death.使用边缘结构测量误差模型估计抗逆转录病毒疗法对艾滋病事件或死亡的长期影响。

Am J Epidemiol. 2010 Jan 1;171(1):113-22. doi: 10.1093/aje/kwp329. Epub 2009 Nov 24.

A simulation study of finite-sample properties of marginal structural Cox proportional hazards models.边缘结构 Cox 比例风险模型有限样本性质的仿真研究。

Stat Med. 2012 Aug 30;31(19):2098-109. doi: 10.1002/sim.5317. Epub 2012 Apr 11.

Constructing inverse probability weights for marginal structural models.构建边际结构模型的逆概率权重。

Am J Epidemiol. 2008 Sep 15;168(6):656-64. doi: 10.1093/aje/kwn164. Epub 2008 Aug 5.

Marginal structural models for case-cohort study designs to estimate the association of antiretroviral therapy initiation with incident AIDS or death.边缘结构模型在病例-队列研究设计中的应用，以估计抗逆转录病毒治疗开始与艾滋病事件或死亡的关联。

Am J Epidemiol. 2012 Mar 1;175(5):381-90. doi: 10.1093/aje/kwr346. Epub 2012 Feb 1.

Machine Learning for Causal Inference: On the Use of Cross-fit Estimators.机器学习在因果推断中的应用：基于交叉拟合估计量的研究。

Epidemiology. 2021 May 1;32(3):393-401. doi: 10.1097/EDE.0000000000001332.

Should a propensity score model be super? The utility of ensemble procedures for causal adjustment.应该使用倾向性评分模型吗？集成方法在因果调整中的效用。

Stat Med. 2019 Apr 30;38(9):1690-1702. doi: 10.1002/sim.8075. Epub 2018 Dec 26.

The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death.评估高效抗逆转录病毒疗法对艾滋病事件或死亡影响的参数 g 公式。

Stat Med. 2012 Aug 15;31(18):2000-9. doi: 10.1002/sim.5316. Epub 2012 Apr 11.

引用本文的文献

[Comparing the effectiveness of multiple dynamic treatment strategies using target trial emulation].[使用目标试验模拟比较多种动态治疗策略的有效性]

Pravent Gesundh. 2023 Jun 12:1-11. doi: 10.1007/s11553-023-01033-8.

So Many Choices: A Guide to Selecting Among Methods to Adjust for Observed Confounders.选择众多：观察性混杂因素调整方法的选择指南

Stat Med. 2025 Feb 28;44(5):e10336. doi: 10.1002/sim.10336.

Human immune and metabolic biomarker levels, and stress-biomarker associations, differ by season: Implications for biomedical health research.人类免疫和代谢生物标志物水平以及应激生物标志物之间的关联因季节而异：对生物医学健康研究的启示。

Brain Behav Immun Health. 2024 May 8;38:100793. doi: 10.1016/j.bbih.2024.100793. eCollection 2024 Jul.

Change in employment status and its causal effect on suicidal ideation and depressive symptoms: A marginal structural model with machine learning algorithms.就业状态变化及其对自杀意念和抑郁症状的因果影响：基于机器学习算法的边缘结构模型。

Scand J Work Environ Health. 2024 Apr 1;50(3):218-227. doi: 10.5271/sjweh.4150. Epub 2024 Mar 10.

Long-Term Outcomes of Early Coronary Artery Disease Testing After New-Onset Heart Failure.新发心力衰竭后早期冠状动脉疾病检测的长期结果。

Circ Heart Fail. 2023 Jul;16(7):e010426. doi: 10.1161/CIRCHEARTFAILURE.122.010426. Epub 2023 May 22.

Air Pollution and Cardiovascular and Thromboembolic Events in Older Adults With High-Risk Conditions.空气污染与老年高危人群心血管和血栓栓塞事件

Am J Epidemiol. 2023 Aug 4;192(8):1358-1370. doi: 10.1093/aje/kwad089.

Associations of polygenic risk scores with posttraumatic stress symptom trajectories following combat deployment.战斗部署后多基因风险评分与创伤后应激症状轨迹的关联。

Psychol Med. 2023 Oct;53(14):6733-6742. doi: 10.1017/S0033291723000211. Epub 2023 Mar 6.

Causal inference from observational data and target trial emulation.基于观察性数据的因果推断与目标试验模拟

Osteoarthritis Cartilage. 2022 Nov;30(11):1415-1417. doi: 10.1016/j.joca.2022.08.010. Epub 2022 Sep 2.

Neural Networks to Estimate Generalized Propensity Scores for Continuous Treatment Doses.用于估计连续治疗剂量广义倾向得分的神经网络

Eval Rev. 2021 Mar 3:193841X21992199. doi: 10.1177/0193841X21992199.

A biologist's guide to model selection and causal inference.生物学家的模型选择与因果推断指南。

Proc Biol Sci. 2021 Jan 27;288(1943):20202815. doi: 10.1098/rspb.2020.2815.

本文引用的文献

J Clin Epidemiol. 2013 Aug;66(8 Suppl):S99-109. doi: 10.1016/j.jclinepi.2013.01.016.

Model feedback in Bayesian propensity score estimation.贝叶斯倾向得分估计中的模型反馈。

Biometrics. 2013 Mar;69(1):263-73. doi: 10.1111/j.1541-0420.2012.01830.x. Epub 2013 Feb 4.

An application of collaborative targeted maximum likelihood estimation in causal inference and genomics.协作靶向最大似然估计在因果推断和基因组学中的应用。

Int J Biostat. 2010;6(1):Article 18. doi: 10.2202/1557-4679.1182. Epub 2010 May 17.

Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计

Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

Improving propensity score weighting using machine learning.使用机器学习改进倾向评分加权。

Stat Med. 2010 Feb 10;29(3):337-46. doi: 10.1002/sim.3782.

The effect of combined antiretroviral therapy on the overall mortality of HIV-infected individuals.联合抗逆转录病毒疗法对 HIV 感染者总体死亡率的影响。

AIDS. 2010 Jan 2;24(1):123-37. doi: 10.1097/QAD.0b013e3283324283.

Super learner.超级学习者。

Stat Appl Genet Mol Biol. 2007;6:Article25. doi: 10.2202/1544-6115.1309. Epub 2007 Sep 16.

[Spanish cohort of naïve HIV-infected patients (CoRIS): rationale, organization and initial results].[西班牙初治HIV感染患者队列研究（CoRIS）：原理、组织架构及初步结果]

Enferm Infecc Microbiol Clin. 2007 Jan;25(1):23-31. doi: 10.1157/13096749.

Propensity score estimation with boosted regression for evaluating causal effects in observational studies.使用增强回归进行倾向评分估计以评估观察性研究中的因果效应。

Psychol Methods. 2004 Dec;9(4):403-25. doi: 10.1037/1082-989X.9.4.403.

Determinants of survival following HIV-1 seroconversion after the introduction of HAART.高效抗逆转录病毒治疗（HAART）引入后HIV-1血清转化后的生存决定因素。

Lancet. 2003 Oct 18;362(9392):1267-74. doi: 10.1016/s0140-6736(03)14570-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验