Bertsimas Dimitris, Koulouras Angelos, Nagata Hiroshi, Gao Carol, Mizusawa Junki, Kanemitsu Yukihide, Margonis Georgios Antonios
Sloan School of Management and Operations Research Center, E62-560, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
National Cancer Center Hospital, Tokyo, Japan.
Res Sq. 2025 Mar 19:rs.3.rs-5576146. doi: 10.21203/rs.3.rs-5576146/v1.
Observational studies provide the only evidence on the effectiveness of interventions when randomized controlled trials (RCTs) - apart from the initial RCT that establishes the efficacy of a treatment compared to a placebo - are impractical due to cost, ethical concerns, or time constraints. While many methodologies aim to draw causal inferences from observational data, there is a growing trend to model observational study designs after hypothetical or existing RCTs, a strategy known as "target trial emulation." Despite its potential, causal inference through target trial emulation is challenging because it cannot fully address the confounding bias inherent in real-world data due to the lack of randomization. In this work, we present a novel framework for target trial emulation that aims to overcome several key limitations, including confounding bias. The framework proceeds as follows: First, we apply the eligibility criteria of a specific trial to an observational cohort derived from real-world data. We then "correct" this cohort by extracting a subset that, through optimization techniques, matches both the distribution of covariates and baseline prognoses (i.e., the prognosis in the trial's control group) of the target RCT. Next, we address unmeasured confounding by adjusting the prognosis estimates of the treated group to align with those observed in the trial, using cost-sensitive counterfactual models. Following trial emulation, we go a step further by leveraging the emulated cohort to train optimal decision trees, developed by our team, to identify subgroups of patients exhibiting heterogeneity in treatment effects (HTE). The absence of confounding is verified using two external models, and the validity of the treatment effects estimated by our framework is independently confirmed by the team responsible for the original trial we emulate. To our knowledge, this is the first framework to successfully address both observed and unobserved confounding, a challenge that has historically limited the use of randomized trial emulation and causal inference in general since the 1950s. Additionally, our framework holds promise in advancing precision or personalized medicine by identifying patient subgroups that benefit most from specific treatments.
当随机对照试验(RCT)——除了确立治疗与安慰剂相比疗效的初始RCT外——由于成本、伦理问题或时间限制而不切实际时,观察性研究提供了关于干预措施有效性的唯一证据。虽然许多方法旨在从观察性数据中得出因果推断,但有一种日益增长的趋势是根据假设的或现有的随机对照试验来构建观察性研究设计,这种策略被称为“目标试验模拟”。尽管具有潜力,但通过目标试验模拟进行因果推断具有挑战性,因为由于缺乏随机化,它无法完全解决现实世界数据中固有的混杂偏倚。在这项工作中,我们提出了一个用于目标试验模拟的新框架,旨在克服几个关键限制,包括混杂偏倚。该框架的步骤如下:首先,我们将特定试验的纳入标准应用于从现实世界数据中得出的观察性队列。然后,我们通过提取一个子集来“校正”这个队列,该子集通过优化技术匹配目标随机对照试验的协变量分布和基线预后(即试验对照组的预后)。接下来,我们使用成本敏感的反事实模型,通过调整治疗组的预后估计值使其与试验中观察到的值一致,来解决未测量的混杂问题。在试验模拟之后,我们更进一步,利用模拟队列来训练我们团队开发的最优决策树,以识别在治疗效果上表现出异质性(HTE)的患者亚组。使用两个外部模型验证无混杂情况,我们框架估计的治疗效果的有效性由我们模拟的原始试验的负责团队独立确认。据我们所知,这是第一个成功解决观察到的和未观察到的混杂问题的框架,自20世纪50年代以来,这个挑战一直限制着随机试验模拟和一般因果推断的应用。此外,我们的框架有望通过识别从特定治疗中获益最大的患者亚组来推动精准医学或个性化医学的发展。