基于靶向最大似然法的因果推断：第一部分。

Targeted maximum likelihood based causal inference: Part I.

作者信息

van der Laan Mark J

机构信息

University of California - Berkeley, CA, USA.

出版信息

Int J Biostat. 2010;6(2):Article 2. doi: 10.2202/1557-4679.1211.

DOI:10.2202/1557-4679.1211

PMID:21969992

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3126670/

Abstract

Given causal graph assumptions, intervention-specific counterfactual distributions of the data can be defined by the so called G-computation formula, which is obtained by carrying out these interventions on the likelihood of the data factorized according to the causal graph. The obtained G-computation formula represents the counterfactual distribution the data would have had if this intervention would have been enforced on the system generating the data. A causal effect of interest can now be defined as some difference between these counterfactual distributions indexed by different interventions. For example, the interventions can represent static treatment regimens or individualized treatment rules that assign treatment in response to time-dependent covariates, and the causal effects could be defined in terms of features of the mean of the treatment-regimen specific counterfactual outcome of interest as a function of the corresponding treatment regimens. Such features could be defined nonparametrically in terms of so called (nonparametric) marginal structural models for static or individualized treatment rules, whose parameters can be thought of as (smooth) summary measures of differences between the treatment regimen specific counterfactual distributions. In this article, we develop a particular targeted maximum likelihood estimator of causal effects of multiple time point interventions. This involves the use of loss-based super-learning to obtain an initial estimate of the unknown factors of the G-computation formula, and subsequently, applying a target-parameter specific optimal fluctuation function (least favorable parametric submodel) to each estimated factor, estimating the fluctuation parameter(s) with maximum likelihood estimation, and iterating this updating step of the initial factor till convergence. This iterative targeted maximum likelihood updating step makes the resulting estimator of the causal effect double robust in the sense that it is consistent if either the initial estimator is consistent, or the estimator of the optimal fluctuation function is consistent. The optimal fluctuation function is correctly specified if the conditional distributions of the nodes in the causal graph one intervenes upon are correctly specified. The latter conditional distributions often comprise the so called treatment and censoring mechanism. Selection among different targeted maximum likelihood estimators (e.g., indexed by different initial estimators) can be based on loss-based cross-validation such as likelihood based cross-validation or cross-validation based on another appropriate loss function for the distribution of the data. Some specific loss functions are mentioned in this article. Subsequently, a variety of interesting observations about this targeted maximum likelihood estimation procedure are made. This article provides the basis for the subsequent companion Part II-article in which concrete demonstrations for the implementation of the targeted MLE in complex causal effect estimation problems are provided.

摘要

在给定因果图假设的情况下，数据的特定干预反事实分布可以通过所谓的G计算公式来定义，该公式是通过对根据因果图分解的数据似然性进行这些干预而获得的。所得到的G计算公式表示如果对生成数据的系统实施此干预，数据本应具有的反事实分布。现在，可以将感兴趣的因果效应定义为这些由不同干预索引的反事实分布之间的某种差异。例如，干预可以表示静态治疗方案或根据随时间变化的协变量分配治疗的个体化治疗规则，并且因果效应可以根据感兴趣的治疗方案特定反事实结果的均值特征作为相应治疗方案的函数来定义。此类特征可以根据用于静态或个体化治疗规则的所谓（非参数）边际结构模型进行非参数定义，其参数可以被视为治疗方案特定反事实分布之间差异的（平滑）汇总度量。在本文中，我们开发了一种针对多个时间点干预因果效应的特定靶向最大似然估计器。这涉及使用基于损失的超学习来获得G计算公式未知因素的初始估计，随后，对每个估计因素应用目标参数特定的最优波动函数（最不利参数子模型），使用最大似然估计来估计波动参数，并迭代此初始因素的更新步骤直至收敛。这种迭代的靶向最大似然更新步骤使得所得的因果效应估计器具有双重稳健性，即如果初始估计器是一致的，或者最优波动函数的估计器是一致的，那么它就是一致的。如果对其进行干预的因果图中节点的条件分布被正确指定，则最优波动函数被正确指定。后者的条件分布通常包括所谓的治疗和删失机制。不同靶向最大似然估计器（例如，由不同初始估计器索引）之间的选择可以基于基于损失的交叉验证，例如基于似然的交叉验证或基于数据分布的另一个适当损失函数的交叉验证。本文提到了一些特定的损失函数。随后，对这种靶向最大似然估计程序进行了各种有趣的观察。本文为后续的配套第二部分文章提供了基础，在该文章中提供了在复杂因果效应估计问题中实施靶向最大似然估计的具体示例。

相似文献

Targeted maximum likelihood based causal inference: Part I.基于靶向最大似然法的因果推断：第一部分。

Int J Biostat. 2010;6(2):Article 2. doi: 10.2202/1557-4679.1211.

Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计

Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

Targeted maximum likelihood based causal inference: Part II.基于靶向最大似然法的因果推断：第二部分。

Int J Biostat. 2010;6(2):Article 3. doi: 10.2202/1557-4679.1241. Epub 2010 Feb 22.

Double Robust Efficient Estimators of Longitudinal Treatment Effects: Comparative Performance in Simulations and a Case Study.纵向治疗效果的双重稳健有效估计量：模拟中的比较性能及一个案例研究

Int J Biostat. 2019 Feb 26;15(2):/j/ijb.2019.15.issue-2/ijb-2017-0054/ijb-2017-0054.xml. doi: 10.1515/ijb-2017-0054.

Causal effect models for realistic individualized treatment and intention to treat rules.用于现实个体化治疗和意向性治疗规则的因果效应模型。

Int J Biostat. 2007;3(1):Article 3. doi: 10.2202/1557-4679.1022.

Marginal Structural Models with Counterfactual Effect Modifiers.具有反事实效应修饰因子的边际结构模型。

Int J Biostat. 2018 Jun 8;14(1):/j/ijb.2018.14.issue-1/ijb-2018-0039/ijb-2018-0039.xml. doi: 10.1515/ijb-2018-0039.

Targeted minimum loss based estimation of causal effects of multiple time point interventions.基于目标最小损失的多个时间点干预因果效应估计

Int J Biostat. 2012;8(1). doi: 10.1515/1557-4679.1370.

Causal Inference for a Population of Causally Connected Units.因果关联单元总体的因果推断

J Causal Inference. 2014 Mar;2(1):13-74. doi: 10.1515/jci-2013-0002.

A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome.对有界连续结果的因果效应的靶向最大似然估计量。

Int J Biostat. 2010;6(1):Article 26. doi: 10.2202/1557-4679.1260. Epub 2010 Aug 1.

Targeted maximum likelihood estimation in safety analysis.目标最大似然估计在安全性分析中的应用。

J Clin Epidemiol. 2013 Aug;66(8 Suppl):S91-8. doi: 10.1016/j.jclinepi.2013.02.017.

引用本文的文献

Robust evaluation of longitudinal surrogate markers with censored data.对带有删失数据的纵向替代标志物进行稳健评估。

J R Stat Soc Series B Stat Methodol. 2024 Dec 26;87(3):891-907. doi: 10.1093/jrsssb/qkae119. eCollection 2025 Jul.

From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions.从有偏选择标签到伪标签：一种用于从有偏决策中学习的期望最大化框架。

Proc Mach Learn Res. 2024 Jul;235:6286-6324.

Loneliness and all cause mortality in Australian women aged 45 years and older: causal inference analysis of longitudinal data.澳大利亚45岁及以上女性的孤独感与全因死亡率：纵向数据的因果推断分析

BMJ Med. 2025 Apr 6;4(1):e001004. doi: 10.1136/bmjmed-2024-001004. eCollection 2025.

From bites to bytes: understanding how and why individual malaria risk varies using artificial intelligence and causal inference.从叮咬到字节：利用人工智能和因果推断理解个体疟疾风险的变化方式及原因。

Front Genet. 2025 May 16;16:1599826. doi: 10.3389/fgene.2025.1599826. eCollection 2025.

Double robust variance estimation with parametric working models.使用参数化工作模型的双重稳健方差估计

Biometrics. 2025 Apr 2;81(2). doi: 10.1093/biomtc/ujaf054.

Housing Instability and Type 2 Diabetes Outcomes.住房不稳定与2型糖尿病结局

JAMA Netw Open. 2025 Apr 1;8(4):e254852. doi: 10.1001/jamanetworkopen.2025.4852.

Assessing the impact of insulin resistance trajectories on cardiovascular disease risk using longitudinal targeted maximum likelihood estimation.使用纵向靶向最大似然估计评估胰岛素抵抗轨迹对心血管疾病风险的影响。

Cardiovasc Diabetol. 2025 Mar 10;24(1):112. doi: 10.1186/s12933-025-02651-6.

Pharmacokinetic interaction assessment of an HIV broadly neutralizing monoclonal antibody VRC07-523LS: a cross-protocol analysis of three phase 1 trials in people without HIV.一种HIV广泛中和单克隆抗体VRC07-523LS的药代动力学相互作用评估：对三项针对未感染HIV人群的1期试验的跨方案分析

BMC Immunol. 2025 Feb 19;26(1):8. doi: 10.1186/s12865-025-00687-7.

Causal survival embeddings: Non-parametric counterfactual inference under right-censoring.因果生存嵌入：右删失下的非参数反事实推断

Stat Methods Med Res. 2025 Mar;34(3):574-593. doi: 10.1177/09622802241311455. Epub 2025 Feb 11.

Unbound bilirubin and risk of severe neurodevelopmental impairment in extremely low birthweight newborns.极低出生体重新生儿中未结合胆红素与严重神经发育障碍风险

Pediatr Res. 2025 Jan 23. doi: 10.1038/s41390-025-03872-x.

本文引用的文献

Outcome trajectory estimation for optimal dynamic treatment regimes with repeated measures.具有重复测量的最优动态治疗方案的结果轨迹估计

J R Stat Soc Ser C Appl Stat. 2023 May 22;72(4):976-991. doi: 10.1093/jrsssc/qlad037. eCollection 2023 Aug.

Estimation based on case-control designs with known prevalence probability.基于已知患病率概率的病例对照设计进行估计。

Int J Biostat. 2008;4(1):Article 17. doi: 10.2202/1557-4679.1114.

Collaborative double robust targeted maximum likelihood estimation.协作双稳健靶向最大似然估计

Int J Biostat. 2010 May 17;6(1):Article 17. doi: 10.2202/1557-4679.1181.

Simple optimal weighting of cases and controls in case-control studies.病例对照研究中病例与对照的简单最优加权

Int J Biostat. 2008 Sep 29;4(1):Article 19. doi: 10.2202/1557-4679.1115.

Why match? Investigating matched case-control study designs with causal effect estimation.为何进行匹配？探讨用于因果效应估计的匹配病例对照研究设计。

Int J Biostat. 2009 Jan 6;5(1):Article 1. doi: 10.2202/1557-4679.1127.

Marginal Mean Models for Dynamic Regimes.动态状态的边际均值模型。

J Am Stat Assoc. 2001 Dec 1;96(456):1410-1423. doi: 10.1198/016214501753382327.

The risk of virologic failure decreases with duration of HIV suppression, at greater than 50% adherence to antiretroviral therapy.随着 HIV 抑制时间的延长，以及抗逆转录病毒治疗的依从性大于 50%，病毒学失败的风险降低。

PLoS One. 2009 Sep 29;4(9):e7196. doi: 10.1371/journal.pone.0007196.

Empirical efficiency maximization: improved locally efficient covariate adjustment in randomized experiments and survival analysis.经验效率最大化：随机试验和生存分析中改进的局部有效协变量调整

Int J Biostat. 2008;4(1):Article 5.

Leisure-time physical activity and all-cause mortality in an elderly cohort.老年队列中的休闲体育活动与全因死亡率

Epidemiology. 2009 May;20(3):424-30. doi: 10.1097/EDE.0b013e31819e3f28.

Causal effect models for realistic individualized treatment and intention to treat rules.用于现实个体化治疗和意向性治疗规则的因果效应模型。

Int J Biostat. 2007;3(1):Article 3. doi: 10.2202/1557-4679.1022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。