• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

D3MI:一种高效且强大的联邦插补方法,通过考虑站点内相关性和站点间异质性来减少分布式不完整数据分析中的偏差。

D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity.

作者信息

Lian Yi, Jiang Xiaoqian, Long Qi

机构信息

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.

McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, USA.

出版信息

medRxiv. 2025 May 8:2025.05.08.25327224. doi: 10.1101/2025.05.08.25327224.

DOI:10.1101/2025.05.08.25327224
PMID:40385381
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12083571/
Abstract

OBJECTIVE

Electronic health records (EHRs) collected from diverse healthcare institutions offer a rich and representative data source for clinical research. Federated learning enables analysis of these distributed data without sharing sensitive patient-level information, preserving privacy. However, missing data remain a major challenge and can introduce substantial bias if not properly addressed. Very few distributed imputation methods currently exist, and they fail to account for two critical aspects of EHR data: correlation within sites and variability across sites. We aim to fill this important methodological gap.

METHODS

We propose Distributed Mixed Model-based Multiple Imputation (D3MI), a novel federated imputation method designed to reduce bias in distributed EHRs. D3MI integrates the strengths from federated learning techniques, statistical learning methods for correlated data, and multilevel imputation algorithms to explicitly account for both and within-site correlation and between-site heterogeneity using site-specific random effects. It preserves privacy by avoiding sharing raw data and features communication and computational efficiency.

RESULTS

Through extensive simulation studies, we demonstrate that D3MI outperforms SOTA distributed imputation methods in both accuracy and consistency. We further demonstrate the use of D3MI in a real-world EHR case study involving incomplete and clustered data from participating hospitals in the Georgia Coverdell Acute Stroke Registry.

CONCLUSION

By explicitly modeling the complex structure of distributed EHR data, D3MI addresses key limitations of existing approaches. It provides a powerful and efficient solution for handling missing data in distributed and privacy-sensitive settings and enhances the rigor and reproducibility of collaborative clinical research.

摘要

目的

从不同医疗机构收集的电子健康记录(EHR)为临床研究提供了丰富且具有代表性的数据源。联邦学习能够在不共享敏感患者层面信息的情况下分析这些分布式数据,从而保护隐私。然而,缺失数据仍然是一个重大挑战,如果处理不当可能会引入大量偏差。目前存在的分布式插补方法非常少,并且它们未能考虑EHR数据的两个关键方面:各站点内的相关性和各站点间的变异性。我们旨在填补这一重要的方法学空白。

方法

我们提出了基于分布式混合模型的多重插补(D3MI),这是一种新颖的联邦插补方法,旨在减少分布式EHR中的偏差。D3MI整合了联邦学习技术、相关数据的统计学习方法以及多级插补算法的优势,通过特定于站点的随机效应明确考虑站点内相关性和站点间异质性。它通过避免共享原始数据以及提高通信和计算效率来保护隐私。

结果

通过广泛的模拟研究,我们证明D3MI在准确性和一致性方面均优于现有的分布式插补方法。我们进一步展示了D3MI在一个真实世界EHR案例研究中的应用,该研究涉及佐治亚州科弗代尔急性卒中登记处参与医院的不完整且聚类的数据。

结论

通过明确对分布式EHR数据的复杂结构进行建模,D3MI解决了现有方法的关键局限性。它为处理分布式和隐私敏感环境中的缺失数据提供了一个强大且高效的解决方案,并提高了协作临床研究的严谨性和可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/65b9e99fed33/nihpp-2025.05.08.25327224v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/b7f04a598f88/nihpp-2025.05.08.25327224v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/b5d44bfa055e/nihpp-2025.05.08.25327224v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/65b9e99fed33/nihpp-2025.05.08.25327224v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/b7f04a598f88/nihpp-2025.05.08.25327224v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/b5d44bfa055e/nihpp-2025.05.08.25327224v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c6/12083571/65b9e99fed33/nihpp-2025.05.08.25327224v1-f0004.jpg

相似文献

1
D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity.D3MI:一种高效且强大的联邦插补方法,通过考虑站点内相关性和站点间异质性来减少分布式不完整数据分析中的偏差。
medRxiv. 2025 May 8:2025.05.08.25327224. doi: 10.1101/2025.05.08.25327224.
2
FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records.FedIMPUTE:用于多站点异构电子健康记录的隐私保护缺失值插补
J Biomed Inform. 2025 May;165:104780. doi: 10.1016/j.jbi.2025.104780. Epub 2025 Mar 5.
3
Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.分布式电子健康记录中针对非随机缺失变量的联合多重填补法
AMIA Annu Symp Proc. 2025 May 22;2024:703-712. eCollection 2024.
4
Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.分布式电子健康记录中针对非随机缺失变量的联邦多重填补
medRxiv. 2024 Sep 16:2024.09.15.24313479. doi: 10.1101/2024.09.15.24313479.
5
Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity.Cafe:利用缺失数据异质性改进联邦数据插补
IEEE Trans Knowl Data Eng. 2025 May;37(5):2266-2281. doi: 10.1109/TKDE.2025.3537403. Epub 2025 Jan 30.
6
A privacy-preserving and computation-efficient federated algorithm for generalized linear mixed models to analyze correlated electronic health records data.一种用于分析相关电子健康记录数据的广义线性混合模型的隐私保护和计算高效的联邦算法。
PLoS One. 2023 Jan 17;18(1):e0280192. doi: 10.1371/journal.pone.0280192. eCollection 2023.
7
FedPC: An Efficient Prototype-Based Clustered Federated Learning on Medical Imaging.FedPC:基于原型的医学影像高效聚类联邦学习
IEEE J Biomed Health Inform. 2025 May 6;PP. doi: 10.1109/JBHI.2025.3567055.
8
The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.FeatureCloud 平台在生物医学领域的联邦学习:统一方法。
J Med Internet Res. 2023 Jul 12;25:e42621. doi: 10.2196/42621.
9
Multiple imputation for analysis of incomplete data in distributed health data networks.分布式健康数据网络中不完全数据的多重插补分析。
Nat Commun. 2020 Oct 29;11(1):5467. doi: 10.1038/s41467-020-19270-2.
10
Advancing Privacy-Preserving Health Care Analytics and Implementation of the Personal Health Train: Federated Deep Learning Study.推进隐私保护医疗保健分析与个人健康列车的实施:联邦深度学习研究
JMIR AI. 2025 Feb 6;4:e60847. doi: 10.2196/60847.

本文引用的文献

1
Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.分布式电子健康记录中针对非随机缺失变量的联合多重填补法
AMIA Annu Symp Proc. 2025 May 22;2024:703-712. eCollection 2024.
2
FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records.FedIMPUTE:用于多站点异构电子健康记录的隐私保护缺失值插补
J Biomed Inform. 2025 May;165:104780. doi: 10.1016/j.jbi.2025.104780. Epub 2025 Mar 5.
3
Immediate CT after hospital arrival and decreased in-hospital mortality in severely injured trauma patients.
创伤患者入院后立即行 CT 检查可降低住院死亡率。
BJS Open. 2023 Jan 6;7(1). doi: 10.1093/bjsopen/zrac133.
4
Federated learning algorithms for generalized mixed-effects model (GLMM) on horizontally partitioned data from distributed sources.基于分布式数据源水平划分数据的广义混合效应模型(GLMM)的联邦学习算法。
BMC Med Inform Decis Mak. 2022 Oct 16;22(1):269. doi: 10.1186/s12911-022-02014-1.
5
dPQL: a lossless distributed algorithm for generalized linear mixed model with application to privacy-preserving hospital profiling.dPQL:一种用于广义线性混合模型的无损分布式算法及其在隐私保护医院分析中的应用。
J Am Med Inform Assoc. 2022 Jul 12;29(8):1366-1371. doi: 10.1093/jamia/ocac067.
6
Multiple imputation for analysis of incomplete data in distributed health data networks.分布式健康数据网络中不完全数据的多重插补分析。
Nat Commun. 2020 Oct 29;11(1):5467. doi: 10.1038/s41467-020-19270-2.
7
The future of digital health with federated learning.联合学习助力数字健康的未来。
NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.
8
A divide-and-conquer method for sparse risk prediction and evaluation.一种用于稀疏风险预测与评估的分治法。
Biostatistics. 2022 Apr 13;23(2):397-411. doi: 10.1093/biostatistics/kxaa031.
9
Privacy-preserving construction of generalized linear mixed model for biomedical computation.用于生物医学计算的广义线性混合模型的隐私保护构建。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i128-i135. doi: 10.1093/bioinformatics/btaa478.
10
Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.系统和偶发性缺失数据的分层插补:一种使用链式方程的近似贝叶斯方法。
Biom J. 2018 Mar;60(2):333-351. doi: 10.1002/bimj.201600220. Epub 2017 Oct 9.