Suppr超能文献

使用BEHRT进行医学概念嵌入的联邦学习。

Federated learning of medical concepts embedding using BEHRT.

作者信息

Ben Shoham Ofir, Rappoport Nadav

机构信息

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel.

出版信息

JAMIA Open. 2024 Oct 23;7(4):ooae110. doi: 10.1093/jamiaopen/ooae110. eCollection 2024 Dec.

Abstract

OBJECTIVES

Electronic health record data is often considered sensitive medical information. Therefore, the EHR data from different medical centers often cannot be shared, making it difficult to create prediction models using multicenter EHR data, which is essential for such models' robustness and generalizability. Federated learning (FL) is an algorithmic approach that allows learning a shared model using data in multiple locations without the need to store all data in a single central place. Our study aims to evaluate an FL approach using the BEHRT model for predictive tasks on EHR data, focusing on next visit prediction.

MATERIALS AND METHODS

We propose an FL approach for learning medical concepts embedding. This pretrained model can be used for fine-tuning for specific downstream tasks. Our approach is based on an embedding model like BEHRT, a deep neural sequence transduction model for EHR. We train using FL, both the masked language modeling (MLM) and the next visit downstream model.

RESULTS

We demonstrate our approach on the MIMIC-IV dataset. We compare the performance of a model trained with FL to one trained on centralized data, observing a difference in average precision ranging from 0% to 3% (absolute), depending on the length of the patients' visit history. Moreover, our approach improves average precision by 4%-10% (absolute) compared to local models. In addition, we show the importance of the usage of pretrained MLM for the next visit diagnoses prediction task.

DISCUSSION AND CONCLUSION

We find that our FL approach reaches very close to the performance of a centralized model, and it outperforms local models in terms of average precision. We also show that pretrained MLM improves the model's average precision performance in the next visit diagnoses prediction task, compared to an MLM without pretraining.

摘要

目标

电子健康记录数据通常被视为敏感的医疗信息。因此,来自不同医疗中心的电子健康记录数据往往无法共享,这使得利用多中心电子健康记录数据创建预测模型变得困难,而这对于此类模型的稳健性和通用性至关重要。联邦学习(FL)是一种算法方法,它允许使用多个位置的数据学习共享模型,而无需将所有数据存储在单个中心位置。我们的研究旨在评估一种使用BEHRT模型的联邦学习方法,用于电子健康记录数据的预测任务,重点是下次就诊预测。

材料与方法

我们提出了一种用于学习医学概念嵌入的联邦学习方法。这个预训练模型可用于针对特定下游任务进行微调。我们的方法基于一种类似BEHRT的嵌入模型,这是一种用于电子健康记录的深度神经序列转导模型。我们使用联邦学习来训练掩码语言模型(MLM)和下次就诊下游模型。

结果

我们在MIMIC-IV数据集上展示了我们的方法。我们将使用联邦学习训练的模型与在集中式数据上训练的模型的性能进行比较,观察到平均精度的差异在0%至3%(绝对值)之间,具体取决于患者就诊历史的长度。此外,与本地模型相比,我们的方法将平均精度提高了4% - 10%(绝对值)。此外,我们展示了预训练的MLM在下次就诊诊断预测任务中的重要性。

讨论与结论

我们发现我们的联邦学习方法的性能非常接近集中式模型,并且在平均精度方面优于本地模型。我们还表明,与未进行预训练的MLM相比,预训练的MLM在下次就诊诊断预测任务中提高了模型的平均精度性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验