• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基于通用数据模型的电子健康记录进行药物不良事件预测的预训练患者轨迹

Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health records.

作者信息

Kim Junmo, Kim Joo Seong, Lee Ji-Hyang, Kim Min-Gyu, Kim Taehyun, Cho Chaeeun, Park Rae Woong, Kim Kwangsoo

机构信息

Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea.

Division of Gastroenterology, Department of Internal Medicine, Dongguk University Ilsan Hospital, Dongguk University College of Medicine, Goyang, Republic of Korea.

出版信息

Commun Med (Lond). 2025 Jun 13;5(1):232. doi: 10.1038/s43856-025-00914-7.

DOI:10.1038/s43856-025-00914-7
PMID:40514403
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12166071/
Abstract

BACKGROUND

Pretraining electronic health record (EHR) data using language models has enhanced performance across various medical tasks. Despite the potential of EHR pretraining models, predicting adverse drug events (ADEs) using EHR pretraining models has not been explored.

METHODS

We used observational medical outcomes partnership common data model (CDM)-based EHR data from Seoul National University Hospital (SNUH) between January 2001 and December 2023 and Ajou University Medical Center (AUMC) between January 2004 and December 2023. In total 510,879 and 419,505 adult inpatients from SNUH and AUMC are included in internal and external datasets. For pretraining, the model was trained to infer randomly masked tokens using preceding and following history. In this process, we introduced domain embedding (DE) to provide information about the domain of masked tokens, preventing the model from finding codes from irrelevant domains. For qualitative analysis, we identified important features using the attention matrix from each finetuned model.

RESULTS

Here we show that EHR pretraining models with DE outperform the models without pretraining and DE in predicting various ADEs, with the average area under the receiver operating characteristic curve (AUROC) of 0.958 and 0.964 in internal and external validations, respectively. For feature importance analysis, we demonstrate that the results are consistent with priorly reported background clinical knowledge. In addition to cohort-level interpretation, patient-level interpretation is also available.

CONCLUSIONS

The CDM-based EHR pretraining model with DE can improve prediction performance for various ADEs and can provide proper explanation at cohort and patient level. Our model has the potential to serve as a foundation model due to its strong prediction performance, interpretability, and compatibility.

摘要

背景

使用语言模型对电子健康记录(EHR)数据进行预训练已提高了各种医疗任务的性能。尽管EHR预训练模型具有潜力,但尚未探索使用EHR预训练模型预测药物不良事件(ADEs)。

方法

我们使用了基于观察性医疗结果合作组织通用数据模型(CDM)的EHR数据,这些数据来自2001年1月至2023年12月的首尔国立大学医院(SNUH)以及2004年1月至2023年12月的亚洲大学医学中心(AUMC)。SNUH和AUMC的内部和外部数据集中分别纳入了510,879名和419,505名成年住院患者。对于预训练,该模型被训练使用前后历史来推断随机掩码令牌。在此过程中,我们引入了领域嵌入(DE)以提供关于掩码令牌领域的信息,防止模型从无关领域中找到代码。为了进行定性分析,我们使用每个微调模型的注意力矩阵来识别重要特征。

结果

我们在此表明,具有DE的EHR预训练模型在预测各种ADEs方面优于未进行预训练和没有DE的模型,内部验证和外部验证中受试者操作特征曲线下面积(AUROC)的平均值分别为0.958和0.964。对于特征重要性分析,我们证明结果与先前报道的背景临床知识一致。除了队列水平的解释外,还可以进行患者水平的解释。

结论

基于CDM的具有DE的EHR预训练模型可以提高对各种ADEs的预测性能,并可以在队列和患者水平上提供适当的解释。由于其强大的预测性能、可解释性和兼容性,我们的模型有潜力作为基础模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/12dbe875fea8/43856_2025_914_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/a54864e4cb5c/43856_2025_914_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/1315ad1c466e/43856_2025_914_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/6470a9213b42/43856_2025_914_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/12dbe875fea8/43856_2025_914_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/a54864e4cb5c/43856_2025_914_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/1315ad1c466e/43856_2025_914_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/6470a9213b42/43856_2025_914_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae4b/12166071/12dbe875fea8/43856_2025_914_Fig4_HTML.jpg

相似文献

1
Pretrained patient trajectories for adverse drug event prediction using common data model-based electronic health records.使用基于通用数据模型的电子健康记录进行药物不良事件预测的预训练患者轨迹
Commun Med (Lond). 2025 Jun 13;5(1):232. doi: 10.1038/s43856-025-00914-7.
2
Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study.基于自监督方法的疾病概念嵌入在电子健康记录中的医学信息提取和疾病检索:算法开发和验证研究。
J Med Internet Res. 2021 Jan 27;23(1):e25113. doi: 10.2196/25113.
3
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
4
Hospital Length of Stay Prediction for Planned Admissions Using Observational Medical Outcomes Partnership Common Data Model: Retrospective Study.利用观察医疗结局伙伴关系通用数据模型预测计划性入院的住院时间:回顾性研究。
J Med Internet Res. 2024 Nov 22;26:e59260. doi: 10.2196/59260.
5
Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study.开发用于预测非心脏手术患者30天主要不良心脑血管事件的机器学习模型:回顾性研究
J Med Internet Res. 2025 Apr 9;27:e66366. doi: 10.2196/66366.
6
EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records.EHR-BERT:一种基于 BERT 的电子健康记录中有效异常检测模型。
J Biomed Inform. 2024 Feb;150:104605. doi: 10.1016/j.jbi.2024.104605. Epub 2024 Feb 6.
7
EHR foundation models improve robustness in the presence of temporal distribution shift.电子健康记录基础模型可提高在时间分布偏移情况下的稳健性。
Sci Rep. 2023 Mar 7;13(1):3767. doi: 10.1038/s41598-023-30820-8.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study.从两家医院的电子健康记录中识别充血性心力衰竭患者:回顾性研究
JMIR Med Inform. 2025 Apr 10;13:e64113. doi: 10.2196/64113.
10
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.

本文引用的文献

1
Expert survey on real-world data utilization and real-world evidence generation for regulatory decision-making in drug lifecycle in Korea.韩国药物生命周期监管决策中真实世界数据利用和真实世界证据生成的专家调查。
Clin Transl Sci. 2024 Apr;17(4):e13801. doi: 10.1111/cts.13801.
2
Privacy-Preserving Federated Model Predicting Bipolar Transition in Patients With Depression: Prediction Model Development Study.隐私保护的联邦模型预测抑郁症患者的双相情感障碍转变:预测模型开发研究。
J Med Internet Res. 2023 Jul 20;25:e46165. doi: 10.2196/46165.
3
Development and external validation of a pretrained deep learning model for the prediction of non-accidental trauma.
用于预测非意外创伤的预训练深度学习模型的开发与外部验证
NPJ Digit Med. 2023 Jul 19;6(1):131. doi: 10.1038/s41746-023-00875-y.
4
Use of Electronic Health Record Data for Drug Safety Signal Identification: A Scoping Review.利用电子健康记录数据识别药物安全信号:范围综述。
Drug Saf. 2023 Aug;46(8):725-742. doi: 10.1007/s40264-023-01325-0. Epub 2023 Jun 20.
5
Feasibility Study of Federated Learning on the Distributed Research Network of OMOP Common Data Model.OMOP通用数据模型分布式研究网络上联邦学习的可行性研究
Healthc Inform Res. 2023 Apr;29(2):168-173. doi: 10.4258/hir.2023.29.2.168. Epub 2023 Apr 30.
6
Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longitudinal Electronic Health Records.Hi-BEHRT:基于分层转换器的模型,用于使用多模态纵向电子健康记录准确预测临床事件。
IEEE J Biomed Health Inform. 2023 Feb;27(2):1106-1117. doi: 10.1109/JBHI.2022.3224727. Epub 2023 Feb 3.
7
Digital Health Profile of South Korea: A Cross Sectional Study.韩国数字健康档案:一项横断面研究。
Int J Environ Res Public Health. 2022 May 23;19(10):6329. doi: 10.3390/ijerph19106329.
8
Standardizing registry data to the OMOP Common Data Model: experience from three pulmonary hypertension databases.将注册数据标准化为OMOP通用数据模型:来自三个肺动脉高压数据库的经验。
BMC Med Res Methodol. 2021 Nov 2;21(1):238. doi: 10.1186/s12874-021-01434-3.
9
The Usage of OHDSI OMOP - A Scoping Review.OHDSI OMOP 的使用 - 范围综述。
Stud Health Technol Inform. 2021 Sep 21;283:95-103. doi: 10.3233/SHTI210546.
10
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT:基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型
NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.