Talbot Denis, Diop Awa, Mésidor Miceline, Chiu Yohann, Sirois Caroline, Spieker Andrew J, Pariente Antoine, Noize Pernelle, Simard Marc, Luque Fernandez Miguel Angel, Schomaker Michael, Fujita Kenji, Gnjidic Danijela, Schnitzer Mireille E
Département de Médecine Sociale et Préventive, Université Laval, Québec, Canada.
Axe Santé des Populations et Pratiques Optimales en Santé, Centre de Recherche du CHU de Québec - Université Laval, Québec, Canada.
Stat Med. 2025 Mar 15;44(6):e70034. doi: 10.1002/sim.70034.
Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time-to-event outcome. This tutorial provides a detailed step-by-step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time-to-event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R-code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real-data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.
靶向最大似然估计(TMLE)是一种越来越流行的用于估计因果效应的框架。它需要对暴露因素和结局进行建模,但具有双重稳健性,即如果这些模型中至少有一个被正确设定,那么它就是有效的。此外,TMLE允许使用机器学习方法对暴露因素和结局进行灵活建模。由于模型设定会自动适应数据,而无需分析师事先指定,这就为测量的混杂因素提供了更好的控制。尽管有这些方法上的优势,但TMLE的受欢迎程度仍低于其他方法,部分原因是其理论和实现方法较难理解。虽然已经提出了一些教程,但没有一个涉及到事件发生时间结局的情况。本教程详细逐步解释了TMLE的实现过程,用于估计点二元或多级暴露因素对事件发生时间结局的影响,将其建模为反事实生存曲线和因果风险比。本教程还提供了在实际应用中如何最好地使用TMLE的指南,包括与研究设计、协变量选择、控制偏差以及机器学习使用相关的方面。提供了R代码以使用模拟数据说明每个步骤(https://github.com/detal9/SurvTMLE)。为便于实现,还提供了一个通用的R函数,该函数实现了带有使用机器学习选项的TMLE。在一项关于2013年至2018年加拿大魁北克省老年人中他汀类药物预防首次心血管疾病有效性的真实数据分析中展示了该方法。