Suppr超能文献

D3MI:一种高效且强大的联邦插补方法,通过考虑站点内相关性和站点间异质性来减少分布式不完整数据分析中的偏差。

D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity.

作者信息

Lian Yi, Jiang Xiaoqian, Long Qi

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.

McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA.

出版信息

medRxiv. 2025 May 8:2025.05.08.25327224. doi: 10.1101/2025.05.08.25327224.

Abstract

OBJECTIVE

Electronic health records (EHRs) collected from diverse healthcare institutions offer a rich and representative data source for clinical research. Federated learning enables analysis of these distributed data without sharing sensitive patient-level information, preserving privacy. However, missing data remain a major challenge and can introduce substantial bias if not properly addressed. Very few distributed imputation methods currently exist, and they fail to account for two critical aspects of EHR data: correlation within sites and variability across sites. We aim to fill this important methodological gap.

METHODS

We propose Distributed Mixed Model-based Multiple Imputation (D3MI), a novel federated imputation method designed to reduce bias in distributed EHRs. D3MI integrates the strengths from federated learning techniques, statistical learning methods for correlated data, and multilevel imputation algorithms to explicitly account for both and within-site correlation and between-site heterogeneity using site-specific random effects. It preserves privacy by avoiding sharing raw data and features communication and computational efficiency.

RESULTS

Through extensive simulation studies, we demonstrate that D3MI outperforms SOTA distributed imputation methods in both accuracy and consistency. We further demonstrate the use of D3MI in a real-world EHR case study involving incomplete and clustered data from participating hospitals in the Georgia Coverdell Acute Stroke Registry.

CONCLUSION

By explicitly modeling the complex structure of distributed EHR data, D3MI addresses key limitations of existing approaches. It provides a powerful and efficient solution for handling missing data in distributed and privacy-sensitive settings and enhances the rigor and reproducibility of collaborative clinical research.

摘要

目的

从不同医疗机构收集的电子健康记录(EHR)为临床研究提供了丰富且具有代表性的数据源。联邦学习能够在不共享敏感患者层面信息的情况下分析这些分布式数据,从而保护隐私。然而,缺失数据仍然是一个重大挑战,如果处理不当可能会引入大量偏差。目前存在的分布式插补方法非常少,并且它们未能考虑EHR数据的两个关键方面:各站点内的相关性和各站点间的变异性。我们旨在填补这一重要的方法学空白。

方法

我们提出了基于分布式混合模型的多重插补(D3MI),这是一种新颖的联邦插补方法,旨在减少分布式EHR中的偏差。D3MI整合了联邦学习技术、相关数据的统计学习方法以及多级插补算法的优势,通过特定于站点的随机效应明确考虑站点内相关性和站点间异质性。它通过避免共享原始数据以及提高通信和计算效率来保护隐私。

结果

通过广泛的模拟研究,我们证明D3MI在准确性和一致性方面均优于现有的分布式插补方法。我们进一步展示了D3MI在一个真实世界EHR案例研究中的应用,该研究涉及佐治亚州科弗代尔急性卒中登记处参与医院的不完整且聚类的数据。

结论

通过明确对分布式EHR数据的复杂结构进行建模,D3MI解决了现有方法的关键局限性。它为处理分布式和隐私敏感环境中的缺失数据提供了一个强大且高效的解决方案,并提高了协作临床研究的严谨性和可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/b7f04a598f88/nihpp-2025.05.08.25327224v1-f0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验