• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医疗保健数据集中用于稳健联邦学习的有效非独立同分布度估计

Effective Non-IID Degree Estimation for Robust Federated Learning in Healthcare Datasets.

作者信息

Chen Kun-Yi, Shyu Chi-Ren, Tsai Yuan-Yu, Baskett William I, Chang Chi-Yu, Chou Che-Yi, Tsai Jeffrey J P, Shae Zon-Yin

机构信息

Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211 USA.

Department of Computer Science and Information Engineering, Asia University, Taichung, 413305 Taiwan.

出版信息

J Healthc Inform Res. 2025 Mar 22;9(3):437-464. doi: 10.1007/s41666-025-00195-8. eCollection 2025 Sep.

DOI:10.1007/s41666-025-00195-8
PMID:40726743
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12290147/
Abstract

Building unbiased and robust machine learning models using datasets from multiple healthcare systems is critical for addressing the needs of diverse patient populations. However, variations in patient demographics and healthcare protocols across systems often lead to significant differences in data distributions. Not Independent and Not Identically Distributed (non-IID) data presents a major challenge in developing effective federated learning (FL) frameworks. This study proposes a method to estimate the non-IID degree between datasets and introduces three metrics (variability, separability, and computational time) to evaluate and compare the performance of non-IID degree estimation methods. We developed a novel non-IID FL algorithm that incorporates the proposed non-IID degree estimation index as regularization into existing FL algorithms for acute kidney injury risk (AKI) prediction. Our results demonstrate that the proposed method for estimating non-IID degree outperforms previous approaches by effectively identifying differences in data distributions between datasets, consistently producing similar estimates of non-IID degree when evaluating different subsamples from the same dataset, requiring significantly less computational time, and providing better interpretability. Finally, we showed that the proposed non-IID FL algorithm achieves higher test accuracy than local learning, concurrent FL algorithms, and centralized learning for the AKI prediction task.

摘要

使用来自多个医疗系统的数据集构建无偏差且稳健的机器学习模型对于满足不同患者群体的需求至关重要。然而,各系统间患者人口统计学特征和医疗协议的差异常常导致数据分布存在显著差异。非独立同分布(non-IID)数据在开发有效的联邦学习(FL)框架方面构成了重大挑战。本研究提出了一种估计数据集之间非IID程度的方法,并引入了三个指标(可变性、可分离性和计算时间)来评估和比较非IID程度估计方法的性能。我们开发了一种新颖的非IID FL算法,该算法将所提出的非IID程度估计指标作为正则化项纳入现有的用于急性肾损伤风险(AKI)预测的FL算法中。我们的结果表明,所提出的估计非IID程度的方法优于先前的方法,它能有效识别数据集之间的数据分布差异,在评估来自同一数据集的不同子样本时始终产生相似的非IID程度估计值,所需计算时间显著更少,且具有更好的可解释性。最后,我们表明,对于AKI预测任务,所提出的非IID FL算法比局部学习、并发FL算法和集中式学习具有更高的测试准确率。

相似文献

1
Effective Non-IID Degree Estimation for Robust Federated Learning in Healthcare Datasets.医疗保健数据集中用于稳健联邦学习的有效非独立同分布度估计
J Healthc Inform Res. 2025 Mar 22;9(3):437-464. doi: 10.1007/s41666-025-00195-8. eCollection 2025 Sep.
2
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
3
Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination.基于梯度协调的非独立同分布数据上人体姿态估计的联邦学习
Sensors (Basel). 2025 Jul 12;25(14):4372. doi: 10.3390/s25144372.
4
FedEmerge: An Entropy-Guided Federated Learning Method for Sensor Networks and Edge Intelligence.FedEmerge:一种用于传感器网络和边缘智能的熵引导联邦学习方法。
Sensors (Basel). 2025 Jun 14;25(12):3728. doi: 10.3390/s25123728.
5
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
6
Efficient federated graph aggregation for privacy-preserving GNN-based session recommendation.用于基于隐私保护的基于图神经网络的会话推荐的高效联邦图聚合。
Sci Rep. 2025 Jul 2;15(1):23394. doi: 10.1038/s41598-025-08256-z.
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
9
Foundation model-driven distributed learning for enhanced retinal age prediction.基于基础模型的分布式学习增强视网膜年龄预测。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2550-2559. doi: 10.1093/jamia/ocae220.
10
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

引用本文的文献

1
Artificial intelligence in personalized nutrition and food manufacturing: a comprehensive review of methods, applications, and future directions.个性化营养与食品制造中的人工智能:方法、应用及未来方向的全面综述
Front Nutr. 2025 Jul 23;12:1636980. doi: 10.3389/fnut.2025.1636980. eCollection 2025.

本文引用的文献

1
Social determinants of health: the need for data science methods and capacity.健康的社会决定因素:对数据科学方法和能力的需求。
Lancet Digit Health. 2024 Apr;6(4):e235-e237. doi: 10.1016/S2589-7500(24)00022-0.
2
A distribution information sharing federated learning approach for medical image data.一种用于医学图像数据的分布式信息共享联邦学习方法。
Complex Intell Systems. 2023 Mar 29:1-12. doi: 10.1007/s40747-023-01035-1.
3
Bias in AI-based models for medical applications: challenges and mitigation strategies.基于人工智能的医学应用模型中的偏差:挑战与缓解策略。
NPJ Digit Med. 2023 Jun 14;6(1):113. doi: 10.1038/s41746-023-00858-z.
4
Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care.电子健康记录联合学习中的数据异质性:重症监护中急性肾损伤和脓毒症疾病风险预测的案例研究
PLOS Digit Health. 2023 Mar 15;2(3):e0000117. doi: 10.1371/journal.pdig.0000117. eCollection 2023 Mar.
5
MIMIC-IV, a freely accessible electronic health record dataset.MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.
6
Beyond bias and discrimination: redefining the AI ethics principle of fairness in healthcare machine-learning algorithms.超越偏见与歧视:重新定义医疗保健机器学习算法中公平性的人工智能伦理原则。
AI Soc. 2023;38(2):549-563. doi: 10.1007/s00146-022-01455-6. Epub 2022 May 21.
7
A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects.关于机器学习在健康社会决定因素研究中的应用的范围综述:趋势与研究前景
SSM Popul Health. 2021 Jun 5;15:100836. doi: 10.1016/j.ssmph.2021.100836. eCollection 2021 Sep.
8
Addressing Bias in Artificial Intelligence in Health Care.应对医疗保健领域人工智能中的偏见问题。
JAMA. 2019 Dec 24;322(24):2377-2378. doi: 10.1001/jama.2019.18058.
9
The eICU Collaborative Research Database, a freely available multi-center database for critical care research.eICU 协作研究数据库,一个免费的多中心重症监护研究数据库。
Sci Data. 2018 Sep 11;5:180178. doi: 10.1038/sdata.2018.178.
10
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.