Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia.
Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, 50 Flemington Road, 3052, Parkville, Australia.
BMC Med Res Methodol. 2023 Feb 16;23(1):42. doi: 10.1186/s12874-023-01843-6.
Despite recent advances in causal inference methods, outcome regression remains the most widely used approach for estimating causal effects in epidemiological studies with a single-point exposure and outcome. Missing data are common in these studies, and complete-case analysis (CCA) and multiple imputation (MI) are two frequently used methods for handling them. In randomised controlled trials (RCTs), it has been shown that MI should be conducted separately by treatment group. In observational studies, causal inference is now understood as the task of emulating an RCT, which raises the question of whether MI should be conducted by exposure group in such studies.
We addressed this question by evaluating the performance of seven methods for handling missing data when estimating causal effects with outcome regression. We conducted an extensive simulation study based on an illustrative case study from the Victorian Adolescent Health Cohort Study, assessing a range of scenarios, including seven outcome generation models with exposure-confounder interactions of differing strength.
The simulation results showed that MI by exposure group led to the least bias when the size of the smallest exposure group was relatively large, followed by MI approaches that included the exposure-confounder interactions.
The findings from our simulation study, which was designed based on a real case study, suggest that current practice for the conduct of MI in causal inference may need to shift to stratifying by exposure group where feasible, or otherwise including exposure-confounder interactions in the imputation model.
尽管因果推断方法最近取得了进展,但在单点暴露和结局的流行病学研究中,结局回归仍然是最广泛使用的估计因果效应的方法。这些研究中经常出现缺失数据,完全案例分析(CCA)和多重插补(MI)是两种常用的处理方法。在随机对照试验(RCT)中,已经表明 MI 应该按治疗组分别进行。在观察性研究中,因果推断现在被理解为模拟 RCT 的任务,这就提出了在这类研究中 MI 是否应该按暴露组进行的问题。
我们通过评估在使用结局回归估计因果效应时处理缺失数据的七种方法的性能来解决这个问题。我们基于维多利亚青少年健康队列研究中的一个实例研究进行了广泛的模拟研究,评估了一系列场景,包括七个具有不同强度暴露-混杂因素相互作用的结局生成模型。
模拟结果表明,当最小暴露组的规模相对较大时,按暴露组进行 MI 导致的偏差最小,其次是包含暴露-混杂因素相互作用的 MI 方法。
我们的模拟研究结果基于实际案例研究设计,表明在因果推断中进行 MI 的当前实践可能需要根据实际情况按暴露组分层,或者在插补模型中包含暴露-混杂因素相互作用。