文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于深度学习的自然语言处理在急诊分诊中检测医疗症状和病史。

Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage.

机构信息

Department of Applied Artificial Intelligence, Hanyang University ERICA, Ansan, Republic of Korea.

Department of Emergency Medicine, College of Medicine, Hanyang University, Seoul, Republic of Korea.

出版信息

Am J Emerg Med. 2024 Mar;77:29-38. doi: 10.1016/j.ajem.2023.11.063. Epub 2023 Dec 10.


DOI:10.1016/j.ajem.2023.11.063
PMID:38096637
Abstract

OBJECTIVE: The manual recording of electronic health records (EHRs) by clinicians in the emergency department (ED) is time-consuming and challenging. In light of recent advancements in large language models (LLMs) such as GPT and BERT, this study aimed to design and validate LLMs for automatic clinical diagnoses. The models were designed to identify 12 medical symptoms and 2 patient histories from simulated clinician-patient conversations within 6 primary symptom scenarios in emergency triage rooms. MATERIALS AND METHOD: We developed classification models by fine-tuning BERT, a transformer-based pre-trained model. We subsequently analyzed these models using eXplainable artificial intelligence (XAI) and the Shapley additive explanation (SHAP) method. A Turing test was conducted to ascertain the reliability of the XAI results by comparing them to the outcomes of tasks performed and explained by medical workers. An emergency medicine specialist assessed the results of both XAI and the medical workers. RESULTS: We fine-tuned four pre-trained LLMs and compared their classification performance. The KLUE-RoBERTa-based model demonstrated the highest performance (F1-score: 0.965, AUROC: 0.893) on human-transcribed script data. The XAI results using SHAP showed an average Jaccard similarity of 0.722 when compared with explanations of medical workers for 15 samples. The Turing test results revealed a small 6% gap, with XAI and medical workers receiving the mean scores of 3.327 and 3.52, respectively. CONCLUSION: This paper highlights the potential of LLMs for automatic EHR recording in Korean EDs. The KLUE-RoBERTa-based model demonstrated superior classification performance. Furthermore, XAI using SHAP provided reliable explanations for model outputs. The reliability of these explanations was confirmed by a Turing test.

摘要

目的:临床医生在急诊科手动记录电子健康记录(EHR)既耗时又具有挑战性。鉴于 GPT 和 BERT 等大型语言模型(LLM)的最新进展,本研究旨在设计和验证用于自动临床诊断的 LLM。这些模型旨在从模拟的医患对话中识别 12 种医学症状和 2 种患者病史,涉及急诊分诊室的 6 种主要症状场景。

材料与方法:我们通过微调 BERT(一种基于转换器的预训练模型)开发了分类模型。然后,我们使用可解释人工智能(XAI)和 Shapley 可加性解释(SHAP)方法分析这些模型。通过将 XAI 结果与医疗工作者执行和解释的任务结果进行比较,进行图灵测试以确定 XAI 结果的可靠性。一位急诊医学专家评估了 XAI 和医疗工作者的结果。

结果:我们微调了四个预训练的 LLM,并比较了它们的分类性能。基于 KLUE-RoBERTa 的模型在人类转录脚本数据上表现出最高的性能(F1 分数:0.965,AUROC:0.893)。当比较 15 个样本的医疗工作者解释时,使用 SHAP 的 XAI 结果的平均杰卡德相似性为 0.722。图灵测试结果显示,XAI 和医疗工作者的平均得分分别为 3.327 和 3.52,存在 6%的微小差距。

结论:本文强调了 LLM 在韩国急诊科自动记录 EHR 的潜力。基于 KLUE-RoBERTa 的模型表现出优越的分类性能。此外,使用 SHAP 的 XAI 为模型输出提供了可靠的解释。通过图灵测试验证了这些解释的可靠性。

相似文献

[1]
Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage.

Am J Emerg Med. 2024-3

[2]
Automatic Classification of the Korean Triage Acuity Scale in Simulated Emergency Rooms Using Speech Recognition and Natural Language Processing: a Proof of Concept Study.

J Korean Med Sci. 2021-7-12

[3]
Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models.

Acad Emerg Med. 2024-6

[4]
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024-6-14

[5]
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024-5-10

[6]
Automatic quantitative stroke severity assessment based on Chinese clinical named entity recognition with domain-adaptive pre-trained large language model.

Artif Intell Med. 2024-4

[7]
Understanding natural language: Potential application of large language models to ophthalmology.

Asia Pac J Ophthalmol (Phila). 2024

[8]
Evaluating large language models for health-related text classification tasks with public social media data.

J Am Med Inform Assoc. 2024-10-1

[9]
Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.

BMC Med Res Methodol. 2024-5-17

[10]
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024-2-13

引用本文的文献

[1]
To take a different approach: Can large language models provide knowledge related to respiratory aspiration?

Digit Health. 2025-7-10

[2]
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

J Med Internet Res. 2025-6-9

[3]
Artificial intelligence for severity triage based on conversations in an emergency department in Korea.

Sci Rep. 2025-5-15

[4]
Analyzing electronic medical records to extract prepregnancy morbidities and pregnancy complications: Toward a learning health system.

Learn Health Syst. 2024-11-26

[5]
Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications.

BMC Med Inform Decis Mak. 2024-9-9

[6]
Automated Extraction of Patient-Centered Outcomes After Breast Cancer Treatment: An Open-Source Large Language Model-Based Toolkit.

JCO Clin Cancer Inform. 2024-8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索