使用BEHRT进行医学概念嵌入的联邦学习。

Federated learning of medical concepts embedding using BEHRT.

作者信息

Ben Shoham Ofir, Rappoport Nadav

机构信息

Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel.

出版信息

JAMIA Open. 2024 Oct 23;7(4):ooae110. doi: 10.1093/jamiaopen/ooae110. eCollection 2024 Dec.

DOI:10.1093/jamiaopen/ooae110

PMID:39445033

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11498200/

Abstract

OBJECTIVES

Electronic health record data is often considered sensitive medical information. Therefore, the EHR data from different medical centers often cannot be shared, making it difficult to create prediction models using multicenter EHR data, which is essential for such models' robustness and generalizability. Federated learning (FL) is an algorithmic approach that allows learning a shared model using data in multiple locations without the need to store all data in a single central place. Our study aims to evaluate an FL approach using the BEHRT model for predictive tasks on EHR data, focusing on next visit prediction.

MATERIALS AND METHODS

We propose an FL approach for learning medical concepts embedding. This pretrained model can be used for fine-tuning for specific downstream tasks. Our approach is based on an embedding model like BEHRT, a deep neural sequence transduction model for EHR. We train using FL, both the masked language modeling (MLM) and the next visit downstream model.

RESULTS

We demonstrate our approach on the MIMIC-IV dataset. We compare the performance of a model trained with FL to one trained on centralized data, observing a difference in average precision ranging from 0% to 3% (absolute), depending on the length of the patients' visit history. Moreover, our approach improves average precision by 4%-10% (absolute) compared to local models. In addition, we show the importance of the usage of pretrained MLM for the next visit diagnoses prediction task.

DISCUSSION AND CONCLUSION

We find that our FL approach reaches very close to the performance of a centralized model, and it outperforms local models in terms of average precision. We also show that pretrained MLM improves the model's average precision performance in the next visit diagnoses prediction task, compared to an MLM without pretraining.

摘要

目标

电子健康记录数据通常被视为敏感的医疗信息。因此，来自不同医疗中心的电子健康记录数据往往无法共享，这使得利用多中心电子健康记录数据创建预测模型变得困难，而这对于此类模型的稳健性和通用性至关重要。联邦学习（FL）是一种算法方法，它允许使用多个位置的数据学习共享模型，而无需将所有数据存储在单个中心位置。我们的研究旨在评估一种使用BEHRT模型的联邦学习方法，用于电子健康记录数据的预测任务，重点是下次就诊预测。

材料与方法

我们提出了一种用于学习医学概念嵌入的联邦学习方法。这个预训练模型可用于针对特定下游任务进行微调。我们的方法基于一种类似BEHRT的嵌入模型，这是一种用于电子健康记录的深度神经序列转导模型。我们使用联邦学习来训练掩码语言模型（MLM）和下次就诊下游模型。

结果

我们在MIMIC-IV数据集上展示了我们的方法。我们将使用联邦学习训练的模型与在集中式数据上训练的模型的性能进行比较，观察到平均精度的差异在0%至3%（绝对值）之间，具体取决于患者就诊历史的长度。此外，与本地模型相比，我们的方法将平均精度提高了4% - 10%（绝对值）。此外，我们展示了预训练的MLM在下次就诊诊断预测任务中的重要性。