Moser Denis, Bender Matthias, Sariyar Murat
Department Medical Informatics, Bern University of Applied Sciences, Biel/Bienne, Switzerland.
Appl Artif Intell. 2025 Jun 18;39(1):2519169. doi: 10.1080/08839514.2025.2519169. eCollection 2025.
Accurate and efficient documentation of patient information is vital in emergency healthcare settings. Traditional manual documentation methods are often time-consuming and prone to errors, potentially affecting patient outcomes. Large Language Models (LLMs) offer a promising solution to enhance medical communication systems; however, their clinical deployment, particularly in non-English languages such as German, presents challenges related to content accuracy, clinical relevance, and data privacy. This study addresses these challenges by developing and evaluating an automated pipeline for emergency medical documentation in German. The research objectives include (1) generating synthetic dialogues with known ground truth data to create controlled datasets for evaluating NLP performance and (2) designing an innovative pipeline to retrieve essential clinical information from these dialogues. A subset of 100 anonymized patient records from the MIMIC-IV-ED dataset was selected, ensuring diversity in demographics, chief complaints, and conditions. A Retrieval-Augmented Generation (RAG) system extracted key nominal and numerical features using chunking, embedding, and dynamic prompts. Evaluation metrics included precision, recall, F1-score, and sentiment analysis. Initial results demonstrated high extraction accuracy, particularly in medication data (F1-scores: 86.21%-100%), though performance declined in nuanced clinical language, requiring further refinement for real-world emergency settings.
在紧急医疗环境中,准确高效地记录患者信息至关重要。传统的手动记录方法往往耗时且容易出错,可能会影响患者的治疗结果。大语言模型(LLMs)为增强医疗通信系统提供了一个有前景的解决方案;然而,它们在临床中的应用,尤其是在德语等非英语语言环境中,在内容准确性、临床相关性和数据隐私方面存在挑战。本研究通过开发和评估一个用于德语紧急医疗记录的自动化流程来应对这些挑战。研究目标包括:(1)使用已知的真实数据生成合成对话,以创建用于评估自然语言处理(NLP)性能的受控数据集;(2)设计一个创新的流程,从这些对话中检索基本的临床信息。从MIMIC-IV-ED数据集中选取了100份匿名患者记录的子集,确保在人口统计学、主要症状和病情方面具有多样性。一个检索增强生成(RAG)系统使用分块、嵌入和动态提示来提取关键的名词和数字特征。评估指标包括精确率、召回率、F1分数和情感分析。初步结果显示出较高的提取准确性,尤其是在用药数据方面(F1分数:86.21%-100%),不过在细微的临床语言方面性能有所下降,需要进一步优化以适用于现实世界的紧急情况。
Appl Artif Intell. 2025-6-18
J Med Internet Res. 2025-7-31
J Am Med Inform Assoc. 2025-3-1
Stud Health Technol Inform. 2024-8-22
Artif Intell Med. 2023-3
JAMIA Open. 2021-2-17
AMIA Annu Symp Proc. 2020-3-4