Suppr超能文献

用于个性化联邦学习的隐私保护患者聚类

Privacy-preserving patient clustering for personalized federated learning.

作者信息

Elhussein Ahmed, Gürsoy Gamze

机构信息

Department of Biomedical Informatics, Columbia University, New York Genome Center, New York City, NY, U.S.A.

Department of Biomedical Informatics, Department of Computer Science, Columbia University, New York Genome Center, New York City, NY, U.S.A.

出版信息

Proc Mach Learn Res. 2023;219:150-166.

Abstract

Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.

摘要

联邦学习(FL)是一种机器学习框架,它使多个组织能够在不与中央服务器共享数据的情况下训练模型。然而,如果数据不是独立同分布(非IID)的,它的性能会显著下降。在医疗环境中,这是一个问题,因为患者群体的差异会显著导致不同医院之间的分布差异。个性化联邦学习通过考虑特定地点的分布差异来解决这个问题。聚类联邦学习是个性化联邦学习的一种变体,它通过将患者跨医院聚类成组并在每个组上训练单独的模型来解决这个问题。然而,隐私问题仍然是一个挑战,因为聚类过程需要交换患者级别的信息。以前通过使用聚合数据形成聚类来解决这个问题,这导致分组不准确和性能下降。在本研究中,我们提出了基于隐私保护社区的联邦机器学习(PCBFL),这是一种新颖的聚类联邦学习框架,它可以在保护隐私的同时使用患者级数据对患者进行聚类。PCBFL使用安全多方计算(一种加密技术)来安全地计算不同医院之间的患者级相似性分数。然后,我们通过使用eICU数据集中的20个地点训练一个联邦死亡率预测模型来评估PCBFL。我们将PCBFL的性能提升与传统和现有的聚类联邦学习框架进行比较。我们的结果表明,PCBFL成功地形成了低、中、高风险患者具有临床意义的队列。PCBFL的表现优于传统和现有的聚类联邦学习框架,平均AUC提高了4.3%,AUPRC提高了7.8%。

相似文献

3
Analyzing the Impact of Personalization on Fairness in Federated Learning for Healthcare.分析个性化对医疗保健联邦学习公平性的影响。
J Healthc Inform Res. 2024 Mar 23;8(2):181-205. doi: 10.1007/s41666-024-00164-7. eCollection 2024 Jun.
5
Clustered Federated Learning in Heterogeneous Environment.异构环境下的聚类联邦学习
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12796-12809. doi: 10.1109/TNNLS.2023.3264740. Epub 2024 Sep 3.
6
Genetic CFL: Hyperparameter Optimization in Clustered Federated Learning.遗传 CFL:聚类联邦学习中的超参数优化。
Comput Intell Neurosci. 2021 Nov 18;2021:7156420. doi: 10.1155/2021/7156420. eCollection 2021.

引用本文的文献

本文引用的文献

2
Towards Personalized Federated Learning.迈向个性化联邦学习。
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9587-9603. doi: 10.1109/TNNLS.2022.3160699. Epub 2023 Nov 30.
4
Differential privacy in health research: A scoping review.健康研究中的差分隐私:范围综述。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.
7
The future of digital health with federated learning.联合学习助力数字健康的未来。
NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验