Goetghebeur Els, le Cessie Saskia, De Stavola Bianca, Moodie Erica Em, Waernbaum Ingeborg
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
Stat Med. 2020 Dec 30;39(30):4922-4948. doi: 10.1002/sim.8741. Epub 2020 Sep 23.
Although review papers on causal inference methods are now available, there is a lack of introductory overviews on what they can render and on the guiding criteria for choosing one particular method. This tutorial gives an overview in situations where an exposure of interest is set at a chosen baseline ("point exposure") and the target outcome arises at a later time point. We first phrase relevant causal questions and make a case for being specific about the possible exposure levels involved and the populations for which the question is relevant. Using the potential outcomes framework, we describe principled definitions of causal effects and of estimation approaches classified according to whether they invoke the no unmeasured confounding assumption (including outcome regression and propensity score-based methods) or an instrumental variable with added assumptions. We mainly focus on continuous outcomes and causal average treatment effects. We discuss interpretation, challenges, and potential pitfalls and illustrate application using a "simulation learner," that mimics the effect of various breastfeeding interventions on a child's later development. This involves a typical simulation component with generated exposure, covariate, and outcome data inspired by a randomized intervention study. The simulation learner further generates various (linked) exposure types with a set of possible values per observation unit, from which observed as well as potential outcome data are generated. It thus provides true values of several causal effects. R code for data generation and analysis is available on www.ofcaus.org, where SAS and Stata code for analysis is also provided.
虽然现在已有关于因果推断方法的综述论文,但对于这些方法所能提供的内容以及选择特定方法的指导标准,仍缺乏入门概述。本教程概述了在将感兴趣的暴露设定为选定基线(“点暴露”)且目标结局在稍后时间点出现的情况下的相关内容。我们首先阐述相关的因果问题,并说明明确涉及的可能暴露水平以及问题所适用的人群的重要性。使用潜在结果框架,我们描述了因果效应的原则性定义以及根据是否调用无未测量混杂假设(包括结局回归和基于倾向得分的方法)或带有附加假设的工具变量进行分类的估计方法。我们主要关注连续结局和因果平均治疗效应。我们讨论解释、挑战和潜在陷阱,并使用一个“模拟学习器”来说明应用,该学习器模拟各种母乳喂养干预对儿童后期发育的影响。这涉及一个典型的模拟组件,其生成的暴露、协变量和结局数据受随机干预研究启发。模拟学习器进一步为每个观察单位生成具有一组可能值的各种(相关)暴露类型,并从中生成观察到的以及潜在的结局数据。因此,它提供了几种因果效应的真实值。数据生成和分析的R代码可在www.ofcaus.org上获取,该网站还提供了用于分析的SAS和Stata代码。