Suppr超能文献

基于联邦学习的稀疏贝叶斯模型及其在电子健康记录和基因组学中的应用。

Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics.

机构信息

Department of Statistics, Texas A&M University, College Station, Texas 77843, USA.

出版信息

Pac Symp Biocomput. 2023;28:484-495.

Abstract

Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not data) information at the global level. While federated learning has made significant advancements for machine learning methods such as deep neural networks, to the best of our knowledge, its development in sparse Bayesian models is still lacking. Sparse Bayesian models are highly interpretable with natural uncertain quantification, a desirable property for many scientific problems. However, without a federated learning algorithm, their applicability to sensitive biological/biomedical data from multiple sources is limited. Therefore, to fill this gap in the literature, we propose a new Bayesian federated learning framework that is capable of pooling information from different data sources without breaching privacy. The proposed method is conceptually simple to understand and implement, accommodates sampling heterogeneity (i.e., non-iid observations) across data sources, and allows for principled uncertainty quantification. We illustrate the proposed framework with three concrete sparse Bayesian models, namely, sparse regression, Markov random field, and directed graphical models. The application of these three models is demonstrated through three real data examples including a multi-hospital COVID-19 study, breast cancer protein-protein interaction networks, and gene regulatory networks.

摘要

联邦学习在包括生物和生物医学领域在内的各个学科中,随着对隐私泄露的担忧不断增加,变得越来越流行。其主要思想是在每个服务器上使用仅可访问该服务器的数据在本地训练模型,并在全局级别聚合模型(而不是数据)信息。虽然联邦学习为机器学习方法(如深度神经网络)取得了重大进展,但据我们所知,它在稀疏贝叶斯模型中的发展仍然不足。稀疏贝叶斯模型具有高度可解释性和自然不确定性量化,这是许多科学问题所需要的特性。然而,如果没有联邦学习算法,它们在来自多个来源的敏感生物/生物医学数据中的适用性就会受到限制。因此,为了填补文献中的这一空白,我们提出了一个新的贝叶斯联邦学习框架,该框架能够在不侵犯隐私的情况下从不同数据源中汇集信息。该方法概念简单易懂,易于实现,适用于数据源之间的抽样异质性(即非iid 观测),并允许进行有原则的不确定性量化。我们通过三个具体的稀疏贝叶斯模型,即稀疏回归、马尔可夫随机场和有向图模型,说明了所提出的框架。这三个模型的应用通过三个真实数据示例进行了演示,包括多医院 COVID-19 研究、乳腺癌蛋白-蛋白相互作用网络和基因调控网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5ea/9782716/6bd68bfcdc2c/nihms-1853306-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验