Abdellaoui Chaima, Redjdal Akram, Seroussi Brigitte
Sorbonne Université, INSERM, Université Sorbonne Paris Nord, LIMICS, Paris, France.
Univ Gustave Eiffel, Aix-Marseille Univ, LBA, F-13016 Marseille, France.
Stud Health Technol Inform. 2025 Jun 26;328:193-197. doi: 10.3233/SHTI250700.
The growing adoption of Large Language Models (LLMs) in clinical settings could transform how information is extracted from clinical documents. Yet challenges persist regarding reliability, hallucinations, and data privacy. This scoping review examines 16 studies (2019-2025) to evaluate the efficacy of LLMs in extracting structured data from clinical practice guidelines and clinical notes, with a focus on prompt engineering strategies and model performance. Findings highlight GPT-4 as the top-performing model, leading in 11 out of 16 studies with >85% accuracy/F1-score in entity extraction. However, performance variability across document types and the necessity of privacy safeguards underscore the need for further research and ethical considerations before large-scale clinical deployment.
大语言模型(LLMs)在临床环境中的应用日益广泛,这可能会改变从临床文档中提取信息的方式。然而,在可靠性、幻觉和数据隐私方面,挑战依然存在。本综述研究了16项研究(2019 - 2025年),以评估大语言模型从临床实践指南和临床笔记中提取结构化数据的效果,重点关注提示工程策略和模型性能。研究结果突出了GPT - 4作为表现最佳的模型,在16项研究中的11项中领先,实体提取准确率/F1分数超过85%。然而,不同文档类型的性能差异以及隐私保护的必要性强调,在大规模临床部署之前,需要进一步研究和进行伦理考量。