基于联邦学习的稀疏贝叶斯模型及其在电子健康记录和基因组学中的应用。

Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics.

机构信息

Department of Statistics, Texas A&M University, College Station, Texas 77843, USA.

出版信息

Pac Symp Biocomput. 2023;28:484-495.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9782716/

Abstract

Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not data) information at the global level. While federated learning has made significant advancements for machine learning methods such as deep neural networks, to the best of our knowledge, its development in sparse Bayesian models is still lacking. Sparse Bayesian models are highly interpretable with natural uncertain quantification, a desirable property for many scientific problems. However, without a federated learning algorithm, their applicability to sensitive biological/biomedical data from multiple sources is limited. Therefore, to fill this gap in the literature, we propose a new Bayesian federated learning framework that is capable of pooling information from different data sources without breaching privacy. The proposed method is conceptually simple to understand and implement, accommodates sampling heterogeneity (i.e., non-iid observations) across data sources, and allows for principled uncertainty quantification. We illustrate the proposed framework with three concrete sparse Bayesian models, namely, sparse regression, Markov random field, and directed graphical models. The application of these three models is demonstrated through three real data examples including a multi-hospital COVID-19 study, breast cancer protein-protein interaction networks, and gene regulatory networks.

摘要

联邦学习在包括生物和生物医学领域在内的各个学科中，随着对隐私泄露的担忧不断增加，变得越来越流行。其主要思想是在每个服务器上使用仅可访问该服务器的数据在本地训练模型，并在全局级别聚合模型（而不是数据）信息。虽然联邦学习为机器学习方法（如深度神经网络）取得了重大进展，但据我们所知，它在稀疏贝叶斯模型中的发展仍然不足。稀疏贝叶斯模型具有高度可解释性和自然不确定性量化，这是许多科学问题所需要的特性。然而，如果没有联邦学习算法，它们在来自多个来源的敏感生物/生物医学数据中的适用性就会受到限制。因此，为了填补文献中的这一空白，我们提出了一个新的贝叶斯联邦学习框架，该框架能够在不侵犯隐私的情况下从不同数据源中汇集信息。该方法概念简单易懂，易于实现，适用于数据源之间的抽样异质性（即非iid 观测），并允许进行有原则的不确定性量化。我们通过三个具体的稀疏贝叶斯模型，即稀疏回归、马尔可夫随机场和有向图模型，说明了所提出的框架。这三个模型的应用通过三个真实数据示例进行了演示，包括多医院 COVID-19 研究、乳腺癌蛋白-蛋白相互作用网络和基因调控网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5ea/9782716/6bd68bfcdc2c/nihms-1853306-f0001.jpg

相似文献

Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics.基于联邦学习的稀疏贝叶斯模型及其在电子健康记录和基因组学中的应用。

Pac Symp Biocomput. 2023;28:484-495.

Boosted federated learning based on improved Particle Swarm Optimization for healthcare IoT devices.基于改进粒子群优化算法的联邦学习在医疗保健物联网设备中的应用。

Comput Biol Med. 2023 Sep;163:107195. doi: 10.1016/j.compbiomed.2023.107195. Epub 2023 Jun 22.

A scalable federated learning solution for secondary care using low-cost microcomputing: privacy-preserving development and evaluation of a COVID-19 screening test in UK hospitals.一种使用低成本微型计算机的二级医疗可扩展联邦学习解决方案：英国医院 COVID-19 筛查测试的隐私保护开发和评估。

Lancet Digit Health. 2024 Feb;6(2):e93-e104. doi: 10.1016/S2589-7500(23)00226-1.

Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.从他人身上学习而不牺牲隐私：移动健康数据集中式和联邦机器学习的模拟比较。

JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.

FedSGDCOVID: Federated SGD COVID-19 Detection under Local Differential Privacy Using Chest X-ray Images and Symptom Information.FedSGDCOVID：基于胸部 X 光图像和症状信息的联邦 SGD COVID-19 检测，采用本地差分隐私保护。

Sensors (Basel). 2022 May 13;22(10):3728. doi: 10.3390/s22103728.

FeARH: Federated machine learning with anonymous random hybridization on electronic medical records.FeARH：基于电子病历的匿名随机混合联邦机器学习

J Biomed Inform. 2021 May;117:103735. doi: 10.1016/j.jbi.2021.103735. Epub 2021 Mar 9.

The value of federated learning during and post-COVID-19.新冠疫情期间和之后联邦学习的价值。

Int J Qual Health Care. 2021 Mar 4;33(1). doi: 10.1093/intqhc/mzab010.

Secure and decentralized federated learning framework with non-IID data based on blockchain.基于区块链的具有非独立同分布数据的安全且去中心化联邦学习框架。

Heliyon. 2024 Feb 29;10(5):e27176. doi: 10.1016/j.heliyon.2024.e27176. eCollection 2024 Mar 15.

A Review of Privacy Enhancement Methods for Federated Learning in Healthcare Systems.联邦学习中增强医疗系统隐私保护的方法综述

Int J Environ Res Public Health. 2023 Aug 7;20(15):6539. doi: 10.3390/ijerph20156539.

Ternary Compression for Communication-Efficient Federated Learning.用于通信高效联邦学习的三元压缩

IEEE Trans Neural Netw Learn Syst. 2022 Mar;33(3):1162-1176. doi: 10.1109/TNNLS.2020.3041185. Epub 2022 Feb 28.

引用本文的文献

Improving Pancreatic Cyst Management: Artificial Intelligence-Powered Prediction of Advanced Neoplasms through Endoscopic Ultrasound-Guided Confocal Endomicroscopy.改善胰腺囊肿管理：通过内镜超声引导共聚焦内镜检查利用人工智能预测高级别肿瘤

Biomimetics (Basel). 2023 Oct 19;8(6):496. doi: 10.3390/biomimetics8060496.

本文引用的文献

Bayesian Graphical Regression.贝叶斯图形回归

J Am Stat Assoc. 2019;114(525):184-197. doi: 10.1080/01621459.2017.1389739. Epub 2018 Jun 28.

A protein interaction landscape of breast cancer.乳腺癌的蛋白质相互作用图谱。

Science. 2021 Oct;374(6563):eabf3066. doi: 10.1126/science.abf3066. Epub 2021 Oct 1.

Consensus Monte Carlo for Random Subsets using Shared Anchors.使用共享锚点的随机子集的共识蒙特卡罗方法。

J Comput Graph Stat. 2020;29(4):703-714. doi: 10.1080/10618600.2020.1737085. Epub 2020 Apr 15.

Federated Learning for Healthcare Informatics.医疗信息学中的联邦学习

J Healthc Inform Res. 2021;5(1):1-19. doi: 10.1007/s41666-020-00082-4. Epub 2020 Nov 12.

KEGG: integrating viruses and cellular organisms.KEGG：整合病毒和细胞生物。

Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551. doi: 10.1093/nar/gkaa970.

Scalable Bayesian Nonparametric Clustering and Classification.可扩展的贝叶斯非参数聚类与分类

J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.

Bayesian Hierarchical Varying-sparsity Regression Models with Application to Cancer Proteogenomics.贝叶斯分层变稀疏回归模型及其在癌症蛋白质基因组学中的应用

J Am Stat Assoc. 2019;114(525):48-60. doi: 10.1080/01621459.2018.1434529. Epub 2018 Aug 15.

Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis.乳腺癌的发生与进展：风险因素、癌症干细胞、信号通路、基因组学及分子发病机制

Genes Dis. 2018 May 12;5(2):77-106. doi: 10.1016/j.gendis.2018.05.001. eCollection 2018 Jun.

Toward a Shared Vision for Cancer Genomic Data.迈向癌症基因组数据的共同愿景。

N Engl J Med. 2016 Sep 22;375(12):1109-12. doi: 10.1056/NEJMp1607591.

VERTIcal Grid lOgistic regression (VERTIGO).垂直网格逻辑回归（VERTIGO）。

J Am Med Inform Assoc. 2016 May;23(3):570-9. doi: 10.1093/jamia/ocv146. Epub 2015 Nov 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于联邦学习的稀疏贝叶斯模型及其在电子健康记录和基因组学中的应用。

Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献