使用基于通用数据模型的电子健康记录进行药物不良事件预测的预训练患者轨迹

Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health records.

作者信息

Kim Junmo, Kim Joo Seong, Lee Ji-Hyang, Kim Min-Gyu, Kim Taehyun, Cho Chaeeun, Park Rae Woong, Kim Kwangsoo

机构信息

Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea.

Division of Gastroenterology, Department of Internal Medicine, Dongguk University Ilsan Hospital, Dongguk University College of Medicine, Goyang, Republic of Korea.

出版信息

Commun Med (Lond). 2025 Jun 13;5(1):232. doi: 10.1038/s43856-025-00914-7.

DOI:10.1038/s43856-025-00914-7

PMID:40514403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12166071/

Abstract

BACKGROUND

Pretraining electronic health record (EHR) data using language models has enhanced performance across various medical tasks. Despite the potential of EHR pretraining models, predicting adverse drug events (ADEs) using EHR pretraining models has not been explored.

METHODS

We used observational medical outcomes partnership common data model (CDM)-based EHR data from Seoul National University Hospital (SNUH) between January 2001 and December 2023 and Ajou University Medical Center (AUMC) between January 2004 and December 2023. In total 510,879 and 419,505 adult inpatients from SNUH and AUMC are included in internal and external datasets. For pretraining, the model was trained to infer randomly masked tokens using preceding and following history. In this process, we introduced domain embedding (DE) to provide information about the domain of masked tokens, preventing the model from finding codes from irrelevant domains. For qualitative analysis, we identified important features using the attention matrix from each finetuned model.

RESULTS

Here we show that EHR pretraining models with DE outperform the models without pretraining and DE in predicting various ADEs, with the average area under the receiver operating characteristic curve (AUROC) of 0.958 and 0.964 in internal and external validations, respectively. For feature importance analysis, we demonstrate that the results are consistent with priorly reported background clinical knowledge. In addition to cohort-level interpretation, patient-level interpretation is also available.

CONCLUSIONS

The CDM-based EHR pretraining model with DE can improve prediction performance for various ADEs and can provide proper explanation at cohort and patient level. Our model has the potential to serve as a foundation model due to its strong prediction performance, interpretability, and compatibility.

摘要

背景

使用语言模型对电子健康记录（EHR）数据进行预训练已提高了各种医疗任务的性能。尽管EHR预训练模型具有潜力，但尚未探索使用EHR预训练模型预测药物不良事件（ADEs）。

方法

我们使用了基于观察性医疗结果合作组织通用数据模型（CDM）的EHR数据，这些数据来自2001年1月至2023年12月的首尔国立大学医院（SNUH）以及2004年1月至2023年12月的亚洲大学医学中心（AUMC）。SNUH和AUMC的内部和外部数据集中分别纳入了510,879名和419,505名成年住院患者。对于预训练，该模型被训练使用前后历史来推断随机掩码令牌。在此过程中，我们引入了领域嵌入（DE）以提供关于掩码令牌领域的信息，防止模型从无关领域中找到代码。为了进行定性分析，我们使用每个微调模型的注意力矩阵来识别重要特征。

结果

我们在此表明，具有DE的EHR预训练模型在预测各种ADEs方面优于未进行预训练和没有DE的模型，内部验证和外部验证中受试者操作特征曲线下面积（AUROC）的平均值分别为0.958和0.964。对于特征重要性分析，我们证明结果与先前报道的背景临床知识一致。除了队列水平的解释外，还可以进行患者水平的解释。

结论

基于CDM的具有DE的EHR预训练模型可以提高对各种ADEs的预测性能，并可以在队列和患者水平上提供适当的解释。由于其强大的预测性能、可解释性和兼容性，我们的模型有潜力作为基础模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用基于通用数据模型的电子健康记录进行药物不良事件预测的预训练患者轨迹

Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health records.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

使用基于通用数据模型的电子健康记录进行药物不良事件预测的预训练患者轨迹

Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health records.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献