Li Siqi, Yan Mengying, Yuan Ruizhi, Liu Molei, Liu Nan, Hong Chuan
Center for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
J Biomed Inform. 2025 May;165:104780. doi: 10.1016/j.jbi.2025.104780. Epub 2025 Mar 5.
We propose FedIMPUTE, a communication-efficient federated learning (FL) based approach for missing value imputation (MVI). Our method enables multiple sites to collaboratively perform MVI in a privacy-preserving manner, addressing challenges of data-sharing constraints and population heterogeneity.
We begin by conducting MVI locally at each participating site, followed by the application of various FL strategies, ranging from basic to advanced, to federate local MVI models without sharing site-specific data. The federated model is then broadcast and used by each site for MVI. We evaluate FedIMPUTE using both simulation studies and a real-world application on electronic health records (EHRs) to predict emergency department (ED) outcomes as a proof of concept.
Simulation studies show that FedIMPUTE outperforms all baseline MVI methods under comparison, improving downstream prediction performance and effectively handling data heterogeneity across sites. By using ED datasets from three hospitals within the Duke University Health System (DUHS), FedIMPUTE achieves the lowest mean squared error (MSE) among benchmark MVI methods, indicating superior imputation accuracy. Additionally, FedIMPUTE provides good downstream prediction performance, outperforming or matching other benchmark methods.
FedIMPUTE enhances the performance of downstream risk prediction tasks, particularly for sites with high missing data rates and small sample sizes. It is easy to implement and communication-efficient, requiring sites to share only non-patient-level summary statistics.
我们提出了FedIMPUTE,一种基于通信高效的联邦学习(FL)的缺失值插补(MVI)方法。我们的方法使多个站点能够以隐私保护的方式协作执行MVI,解决数据共享限制和人群异质性的挑战。
我们首先在每个参与站点本地进行MVI,然后应用从基本到高级的各种FL策略,在不共享特定站点数据的情况下联合本地MVI模型。然后广播联合模型,供每个站点用于MVI。我们使用模拟研究和电子健康记录(EHR)的实际应用来评估FedIMPUTE,以预测急诊科(ED)结果作为概念验证。
模拟研究表明,FedIMPUTE在比较中优于所有基线MVI方法,提高了下游预测性能,并有效处理了各站点之间的数据异质性。通过使用杜克大学健康系统(DUHS)内三家医院的ED数据集,FedIMPUTE在基准MVI方法中实现了最低的均方误差(MSE),表明插补精度更高。此外,FedIMPUTE提供了良好的下游预测性能,优于或匹配其他基准方法。
FedIMPUTE提高了下游风险预测任务的性能,特别是对于数据缺失率高和样本量小的站点。它易于实现且通信高效,要求站点仅共享非患者级别的汇总统计信息。