使用分布式观察数据进行治疗效果估计的联合目标试验模拟

Federated Target Trial Emulation using Distributed Observational Data for Treatment Effect Estimation.

作者信息

Li Haoyang, Zang Chengxi, Xu Zhenxing, Pan Weishen, Rajendran Suraj, Chen Yong, Wang Fei

机构信息

Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.

Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY, USA.

出版信息

medRxiv. 2025 May 5:2025.05.02.25326905. doi: 10.1101/2025.05.02.25326905.

DOI:10.1101/2025.05.02.25326905

PMID:40385404

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12083601/

Abstract

Target trial emulation (TTE) aims to estimate treatment effects by simulating randomized controlled trials using real-world observational data. Applying TTE across distributed datasets shows great promise in improving generalizability and power but is always infeasible due to privacy and data-sharing constraints. Here we propose a Federated Learning-based TTE framework, FL-TTE, that enables TTE across multiple sites without sharing patient-level data. FL-TTE incorporates federated protocol design, federated inverse probability of treatment weighting, and a federated Cox proportional hazards model to estimate time-to-event outcomes across heterogeneous data. We validated FL-TTE by emulating Sepsis trials using eICU and MIMIC-IV data from 192 hospitals, and Alzheimer's trials using INSIGHT Network across five New York City health systems. FL-TTE produced less biased estimates than traditional meta-analysis methods when compared to pooled results and is theoretically supported. Our FL-TTE enables federated treatment effect estimation across distributed and heterogeneous data in a privacy-preserved way.

摘要

目标试验模拟（TTE）旨在通过使用真实世界观察数据模拟随机对照试验来估计治疗效果。在分布式数据集上应用TTE在提高普遍性和效能方面显示出巨大潜力，但由于隐私和数据共享限制，往往不可行。在此，我们提出了一个基于联邦学习的TTE框架，即FL-TTE，它能够在不共享患者层面数据的情况下跨多个站点进行TTE。FL-TTE结合了联邦协议设计、治疗权重的联邦逆概率以及联邦Cox比例风险模型，以估计跨异构数据的事件发生时间结果。我们通过使用来自192家医院的eICU和MIMIC-IV数据模拟脓毒症试验，以及使用纽约市五个卫生系统的INSIGHT网络模拟阿尔茨海默病试验，对FL-TTE进行了验证。与汇总结果相比，FL-TTE产生的估计偏差比传统的荟萃分析方法更小，并且有理论支持。我们的FL-TTE能够以隐私保护的方式跨分布式和异构数据进行联邦治疗效果估计。