Suppr超能文献

基于联邦学习的稀疏贝叶斯模型及其在电子健康记录和基因组学中的应用。

Federated Learning for Sparse Bayesian Models with Applications to Electronic Health Records and Genomics.

机构信息

Department of Statistics, Texas A&M University, College Station, Texas 77843, USA.

出版信息

Pac Symp Biocomput. 2023;28:484-495.

Abstract

Federated learning is becoming increasingly more popular as the concern of privacy breaches rises across disciplines including the biological and biomedical fields. The main idea is to train models locally on each server using data that are only available to that server and aggregate the model (not data) information at the global level. While federated learning has made significant advancements for machine learning methods such as deep neural networks, to the best of our knowledge, its development in sparse Bayesian models is still lacking. Sparse Bayesian models are highly interpretable with natural uncertain quantification, a desirable property for many scientific problems. However, without a federated learning algorithm, their applicability to sensitive biological/biomedical data from multiple sources is limited. Therefore, to fill this gap in the literature, we propose a new Bayesian federated learning framework that is capable of pooling information from different data sources without breaching privacy. The proposed method is conceptually simple to understand and implement, accommodates sampling heterogeneity (i.e., non-iid observations) across data sources, and allows for principled uncertainty quantification. We illustrate the proposed framework with three concrete sparse Bayesian models, namely, sparse regression, Markov random field, and directed graphical models. The application of these three models is demonstrated through three real data examples including a multi-hospital COVID-19 study, breast cancer protein-protein interaction networks, and gene regulatory networks.

摘要

联邦学习在包括生物和生物医学领域在内的各个学科中,随着对隐私泄露的担忧不断增加,变得越来越流行。其主要思想是在每个服务器上使用仅可访问该服务器的数据在本地训练模型,并在全局级别聚合模型(而不是数据)信息。虽然联邦学习为机器学习方法(如深度神经网络)取得了重大进展,但据我们所知,它在稀疏贝叶斯模型中的发展仍然不足。稀疏贝叶斯模型具有高度可解释性和自然不确定性量化,这是许多科学问题所需要的特性。然而,如果没有联邦学习算法,它们在来自多个来源的敏感生物/生物医学数据中的适用性就会受到限制。因此,为了填补文献中的这一空白,我们提出了一个新的贝叶斯联邦学习框架,该框架能够在不侵犯隐私的情况下从不同数据源中汇集信息。该方法概念简单易懂,易于实现,适用于数据源之间的抽样异质性(即非iid 观测),并允许进行有原则的不确定性量化。我们通过三个具体的稀疏贝叶斯模型,即稀疏回归、马尔可夫随机场和有向图模型,说明了所提出的框架。这三个模型的应用通过三个真实数据示例进行了演示,包括多医院 COVID-19 研究、乳腺癌蛋白-蛋白相互作用网络和基因调控网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5ea/9782716/6bd68bfcdc2c/nihms-1853306-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验