文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究

Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.

作者信息

Lu Qiuhao, Wen Andrew, Nguyen Thien, Liu Hongfang

机构信息

McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX, United States.

Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States.

出版信息

JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.


DOI:10.2196/56932
PMID:39106099
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11336492/
Abstract

BACKGROUND: Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowledge graphs like UMLS (Unified Medical Language System), SNOMED CT (Systematized Medical Nomenclature for Medicine-Clinical Terminology), and HPO (Human Phenotype Ontology), while comprehensive, fail to effectively connect general biomedical knowledge with physician insights. There is an equally important need for a model that integrates diverse knowledge in a way that is both unified and compartmentalized. This approach not only addresses the heterogeneous nature of domain knowledge but also recognizes the unique data and knowledge repositories of individual health care institutions, necessitating careful and respectful management of proprietary information. OBJECTIVE: This study aimed to enhance the clinical relevance and interpretability of PLMs by integrating external knowledge in a manner that respects the diversity and proprietary nature of health care data. We hypothesize that domain knowledge, when captured and distributed as stand-alone modules, can be effectively reintegrated into PLMs to significantly improve their adaptability and utility in clinical settings. METHODS: We demonstrate that through adapters, small and lightweight neural networks that enable the integration of extra information without full model fine-tuning, we can inject diverse sources of external domain knowledge into language models and improve the overall performance with an increased level of interpretability. As a practical application of this methodology, we introduce a novel task, structured as a case study, that endeavors to capture physician knowledge in assigning cardiovascular diagnoses from clinical narratives, where we extract diagnosis-comment pairs from electronic health records (EHRs) and cast the problem as text classification. RESULTS: The study demonstrates that integrating domain knowledge into PLMs significantly improves their performance. While improvements with ClinicalBERT are more modest, likely due to its pretraining on clinical texts, BERT (bidirectional encoder representations from transformer) equipped with knowledge adapters surprisingly matches or exceeds ClinicalBERT in several metrics. This underscores the effectiveness of knowledge adapters and highlights their potential in settings with strict data privacy constraints. This approach also increases the level of interpretability of these models in a clinical context, which enhances our ability to precisely identify and apply the most relevant domain knowledge for specific tasks, thereby optimizing the model's performance and tailoring it to meet specific clinical needs. CONCLUSIONS: This research provides a basis for creating health knowledge graphs infused with physician knowledge, marking a significant step forward for PLMs in health care. Notably, the model balances integrating knowledge both comprehensively and selectively, addressing the heterogeneous nature of medical knowledge and the privacy needs of health care institutions.

摘要

背景:尽管预训练语言模型(PLMs)在医疗保健领域的应用越来越广泛,但由于领域专业知识不足和可解释性差,它们往往缺乏临床相关性。克服这些挑战的一个关键策略是将外部知识整合到PLMs中,增强其适应性和临床实用性。当前的生物医学知识图谱,如UMLS(统一医学语言系统)、SNOMED CT(医学临床术语系统命名法)和HPO(人类表型本体),虽然内容全面,但未能有效地将一般生物医学知识与医生的见解联系起来。同样迫切需要一种能够以统一且分层的方式整合各种知识的模型。这种方法不仅解决了领域知识的异构性问题,还认识到各个医疗保健机构独特的数据和知识库,因此需要谨慎且尊重地管理专有信息。 目的:本研究旨在通过以尊重医疗保健数据的多样性和专有性质的方式整合外部知识,提高PLMs的临床相关性和可解释性。我们假设,当领域知识作为独立模块被捕获和分发时,可以有效地重新整合到PLMs中,从而显著提高其在临床环境中的适应性和实用性。 方法:我们证明,通过适配器(一种小型轻量级神经网络,能够在不进行完整模型微调的情况下整合额外信息),我们可以将各种外部领域知识源注入语言模型,并在提高可解释性的同时提升整体性能。作为该方法的实际应用,我们引入了一项新颖的任务,将其构建为一个案例研究,旨在从临床叙述中获取医生在进行心血管诊断时的知识,我们从电子健康记录(EHRs)中提取诊断-评论对,并将该问题转化为文本分类。 结果:该研究表明,将领域知识整合到PLMs中可显著提高其性能。虽然ClinicalBERT的改进较为有限,可能是由于其在临床文本上的预训练,但配备知识适配器的BERT(来自Transformer的双向编码器表示)在几个指标上出人意料地与ClinicalBERT相当或超过了它。这凸显了知识适配器的有效性,并突出了它们在严格数据隐私约束环境中的潜力。这种方法还提高了这些模型在临床环境中的可解释性水平,增强了我们为特定任务精确识别和应用最相关领域知识的能力,从而优化模型性能并使其适应特定临床需求。 结论:本研究为创建融入医生知识的健康知识图谱提供了基础,标志着PLMs在医疗保健领域向前迈出了重要一步。值得注意的是,该模型在全面和有选择地整合知识之间取得了平衡,解决了医学知识的异构性问题以及医疗保健机构的隐私需求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/8f5a65b3917f/ai_v3i1e56932_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/783b9a2ace08/ai_v3i1e56932_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/c8581095533e/ai_v3i1e56932_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/6eae8da9ef42/ai_v3i1e56932_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/8f5a65b3917f/ai_v3i1e56932_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/783b9a2ace08/ai_v3i1e56932_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/c8581095533e/ai_v3i1e56932_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/6eae8da9ef42/ai_v3i1e56932_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f782/11336492/8f5a65b3917f/ai_v3i1e56932_fig4.jpg

相似文献

[1]
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.

JMIR AI. 2024-8-6

[2]
Sexual Harassment and Prevention Training

2025-1

[3]
Short-Term Memory Impairment

2025-1

[4]
Systemic Inflammatory Response Syndrome

2025-1

[5]
Accreditation through the eyes of nurse managers: an infinite staircase or a phenomenon that evaporates like water.

J Health Organ Manag. 2025-6-30

[6]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[7]
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.

JMIR Med Inform. 2025-6-4

[8]
Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.

Cochrane Database Syst Rev. 2015-7-27

[9]
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.

Health Technol Assess. 2001

[10]
Gender differences in the context of interventions for improving health literacy in migrants: a qualitative evidence synthesis.

Cochrane Database Syst Rev. 2024-12-12

引用本文的文献

[1]
Multimodal Deep Learning Based on Ultrasound Images and Clinical Data for Better Ovarian Cancer Diagnosis.

J Imaging Inform Med. 2025-6-24

本文引用的文献

[1]
Machine learning-based prediction of COVID-19 diagnosis based on symptoms.

NPJ Digit Med. 2021-1-4

[2]
The Human Phenotype Ontology in 2021.

Nucleic Acids Res. 2021-1-8

[3]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020-2-15

[4]
MIMIC-III, a freely accessible critical care database.

Sci Data. 2016-5-24

[5]
Electronic medical records for clinical research: application to the identification of heart failure.

Am J Manag Care. 2007-6

[6]
SNOMED-CT: The advanced terminology and coding system for eHealth.

Stud Health Technol Inform. 2006

[7]
The Unified Medical Language System (UMLS): integrating biomedical terminology.

Nucleic Acids Res. 2004-1-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索