用于个性化联邦学习的隐私保护患者聚类

Privacy-preserving patient clustering for personalized federated learning.

作者信息

Elhussein Ahmed, Gürsoy Gamze

机构信息

Department of Biomedical Informatics, Columbia University, New York Genome Center, New York City, NY, U.S.A.

Department of Biomedical Informatics, Department of Computer Science, Columbia University, New York Genome Center, New York City, NY, U.S.A.

出版信息

Proc Mach Learn Res. 2023;219:150-166.

PMID:39239484

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11376435/

Abstract

Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.

摘要

联邦学习（FL）是一种机器学习框架，它使多个组织能够在不与中央服务器共享数据的情况下训练模型。然而，如果数据不是独立同分布（非IID）的，它的性能会显著下降。在医疗环境中，这是一个问题，因为患者群体的差异会显著导致不同医院之间的分布差异。个性化联邦学习通过考虑特定地点的分布差异来解决这个问题。聚类联邦学习是个性化联邦学习的一种变体，它通过将患者跨医院聚类成组并在每个组上训练单独的模型来解决这个问题。然而，隐私问题仍然是一个挑战，因为聚类过程需要交换患者级别的信息。以前通过使用聚合数据形成聚类来解决这个问题，这导致分组不准确和性能下降。在本研究中，我们提出了基于隐私保护社区的联邦机器学习（PCBFL），这是一种新颖的聚类联邦学习框架，它可以在保护隐私的同时使用患者级数据对患者进行聚类。PCBFL使用安全多方计算（一种加密技术）来安全地计算不同医院之间的患者级相似性分数。然后，我们通过使用eICU数据集中的20个地点训练一个联邦死亡率预测模型来评估PCBFL。我们将PCBFL的性能提升与传统和现有的聚类联邦学习框架进行比较。我们的结果表明，PCBFL成功地形成了低、中、高风险患者具有临床意义的队列。PCBFL的表现优于传统和现有的聚类联邦学习框架，平均AUC提高了4.3%，AUPRC提高了7.8%。

相似文献

Privacy-preserving patient clustering for personalized federated learning.用于个性化联邦学习的隐私保护患者聚类

Proc Mach Learn Res. 2023;219:150-166.

Contrastive encoder pre-training-based clustered federated learning for heterogeneous data.基于对比编码器预训练的聚类联邦学习用于异构数据。

Neural Netw. 2023 Aug;165:689-704. doi: 10.1016/j.neunet.2023.06.010. Epub 2023 Jun 10.

Analyzing the Impact of Personalization on Fairness in Federated Learning for Healthcare.分析个性化对医疗保健联邦学习公平性的影响。

J Healthc Inform Res. 2024 Mar 23;8(2):181-205. doi: 10.1007/s41666-024-00164-7. eCollection 2024 Jun.

The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.FeatureCloud 平台在生物医学领域的联邦学习：统一方法。

J Med Internet Res. 2023 Jul 12;25:e42621. doi: 10.2196/42621.

Clustered Federated Learning in Heterogeneous Environment.异构环境下的聚类联邦学习

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12796-12809. doi: 10.1109/TNNLS.2023.3264740. Epub 2024 Sep 3.

Genetic CFL: Hyperparameter Optimization in Clustered Federated Learning.遗传 CFL：聚类联邦学习中的超参数优化。

Comput Intell Neurosci. 2021 Nov 18;2021:7156420. doi: 10.1155/2021/7156420. eCollection 2021.

Analysis of Privacy Preservation Enhancements in Federated Learning Frameworks联邦学习框架中隐私保护增强措施分析

Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records.患者聚类提高了联邦机器学习的效率，可使用分布式电子病历预测死亡率和住院时间。

J Biomed Inform. 2019 Nov;99:103291. doi: 10.1016/j.jbi.2019.103291. Epub 2019 Sep 24.

Personalized federated learning for heterogeneous data: A distributed edge clustering approach.面向异构数据的个性化联邦学习：一种分布式边缘聚类方法。

Math Biosci Eng. 2023 Apr 17;20(6):10725-10740. doi: 10.3934/mbe.2023475.

Predicting treatment response in multicenter non-small cell lung cancer patients based on federated learning.基于联邦学习预测多中心非小细胞肺癌患者的治疗反应。

BMC Cancer. 2024 Jun 5;24(1):688. doi: 10.1186/s12885-024-12456-7.

引用本文的文献

A personalized federated learning approach to enhance joint modeling for heterogeneous medical institutions.一种用于增强异构医疗机构联合建模的个性化联邦学习方法。

Digit Health. 2025 Jul 29;11:20552076251360861. doi: 10.1177/20552076251360861. eCollection 2025 Jan-Dec.

A generalizable physiological model for detection of Delayed Cerebral Ischemia using Federated Learning.一种使用联邦学习检测迟发性脑缺血的通用生理模型。

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2023 Dec;2023:1886-1889. doi: 10.1109/bibm58861.2023.10385383. Epub 2024 Jan 18.

本文引用的文献

Federated learning enables big data for rare cancer boundary detection.联邦学习为罕见癌症边界检测提供大数据支持。

Nat Commun. 2022 Dec 5;13(1):7346. doi: 10.1038/s41467-022-33407-5.

Towards Personalized Federated Learning.迈向个性化联邦学习。

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):9587-9603. doi: 10.1109/TNNLS.2022.3160699. Epub 2023 Nov 30.

Federated learning for predicting clinical outcomes in patients with COVID-19.基于联邦学习的 COVID-19 患者临床结局预测

Nat Med. 2021 Oct;27(10):1735-1743. doi: 10.1038/s41591-021-01506-3. Epub 2021 Sep 15.

Differential privacy in health research: A scoping review.健康研究中的差分隐私：范围综述。

J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.

Subphenotyping depression using machine learning and electronic health records.使用机器学习和电子健康记录对抑郁症进行亚分型

Learn Health Syst. 2020 Aug 3;4(4):e10241. doi: 10.1002/lrh2.10241. eCollection 2020 Oct.

I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data.我尝试了很多方法：大脑数据分类中意想不到的过度拟合的危险。

Neurosci Biobehav Rev. 2020 Dec;119:456-467. doi: 10.1016/j.neubiorev.2020.09.036. Epub 2020 Oct 6.

The future of digital health with federated learning.联合学习助力数字健康的未来。

NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.

Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints.聚集联邦学习：隐私约束下的模型不可知分布式多任务优化。

IEEE Trans Neural Netw Learn Syst. 2021 Aug;32(8):3710-3722. doi: 10.1109/TNNLS.2020.3015958. Epub 2021 Aug 3.

Benchmarking machine learning models on multi-centre eICU critical care dataset.基于多中心 eICU 重症监护数据集的机器学习模型基准测试。

PLoS One. 2020 Jul 2;15(7):e0235424. doi: 10.1371/journal.pone.0235424. eCollection 2020.

J Biomed Inform. 2019 Nov;99:103291. doi: 10.1016/j.jbi.2019.103291. Epub 2019 Sep 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。