Institute of Public Health, Charité - Universitätsmedizin, Berlin, Germany.
PreMeDICaL, Inria Sophia-Antipolis, Montpellier, France.
Biom J. 2023 Jun;65(5):e2100294. doi: 10.1002/bimj.202100294. Epub 2023 Mar 12.
We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose three multiple imputation strategies to handle missing values when generalizing treatment effects, each handling the multisource structure of the problem differently (separate imputation, joint imputation with fixed effect, joint imputation ignoring source information). As an alternative to multiple imputation, we also propose a direct estimation approach that treats incomplete covariates as semidiscrete variables. The multiple imputation strategies and the latter alternative rely on different sets of assumptions concerning the impact of missing values on identifiability. We discuss these assumptions and assess the methods through an extensive simulation study. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and an RCT studying the effect of tranexamic acid administration on mortality in major trauma patients admitted to intensive care units. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.
我们专注于将随机对照试验 (RCT) 中估计的因果效应推广到观察性数据中描述的一组协变量所代表的目标人群的问题。现有的方法,如逆倾向评分加权法,不是专门设计用于处理缺失值的,而缺失值在这两种数据源中都很常见。除了耦合因果效应可识别性和缺失值机制的假设,并定义适当的估计策略外,一个困难是考虑具有两个来源和仅在 RCT 中可用的治疗和结果的特定数据结构。当推广治疗效果时,我们提出了三种多重插补策略来处理缺失值,每种策略都以不同的方式处理问题的多源结构(单独插补、固定效应联合插补、忽略源信息的联合插补)。作为多重插补的替代方法,我们还提出了一种直接估计方法,将不完整的协变量视为半离散变量。多重插补策略和后一种替代方法依赖于关于缺失值对可识别性影响的不同假设集。我们讨论了这些假设,并通过广泛的模拟研究评估了这些方法。这项工作的动机是对一个超过 20000 名重大创伤患者的大型注册中心和一项研究氨甲环酸给药对 ICU 收治的重大创伤患者死亡率影响的 RCT 的分析。该分析说明了缺失值处理如何影响从 RCT 推广到目标人群的效果的结论。