Suppr超能文献

一种用于评估大型语言模型在医学文本摘要方面的临床安全性和幻觉率的框架。

A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation.

作者信息

Asgari Elham, Montaña-Brown Nina, Dubois Magda, Khalil Saleh, Balloch Jasmine, Yeung Joshua Au, Pimenta Dominic

机构信息

Tortus AI, London, UK.

Guy's and St Thomas NHS Trust, London, UK.

出版信息

NPJ Digit Med. 2025 May 13;8(1):274. doi: 10.1038/s41746-025-01670-7.

Abstract

Integrating large language models (LLMs) into healthcare can enhance workflow efficiency and patient care by automating tasks such as summarising consultations. However, the fidelity between LLM outputs and ground truth information is vital to prevent miscommunication that could lead to compromise in patient safety. We propose a framework comprising (1) an error taxonomy for classifying LLM outputs, (2) an experimental structure for iterative comparisons in our LLM document generation pipeline, (3) a clinical safety framework to evaluate the harms of errors, and (4) a graphical user interface, CREOLA, to facilitate these processes. Our clinical error metrics were derived from 18 experimental configurations involving LLMs for clinical note generation, consisting of 12,999 clinician-annotated sentences. We observed a 1.47% hallucination rate and a 3.45% omission rate. By refining prompts and workflows, we successfully reduced major errors below previously reported human note-taking rates, highlighting the framework's potential for safer clinical documentation.

摘要

将大语言模型(LLMs)整合到医疗保健中,可以通过自动执行诸如总结会诊等任务来提高工作流程效率和患者护理水平。然而,大语言模型输出与真实信息之间的保真度对于防止可能导致患者安全受损的沟通失误至关重要。我们提出了一个框架,包括(1)用于对大语言模型输出进行分类的错误分类法,(2)在我们的大语言模型文档生成管道中进行迭代比较的实验结构,(3)用于评估错误危害的临床安全框架,以及(4)一个图形用户界面CREOLA,以促进这些过程。我们的临床错误指标来自18种涉及用于生成临床记录的大语言模型的实验配置,包括12999条由临床医生注释的句子。我们观察到幻觉率为1.47%,遗漏率为3.45%。通过优化提示和工作流程,我们成功地将主要错误降低到低于先前报告的人工记录率,突出了该框架在更安全的临床文档记录方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c30f/12075489/a0eaa8b446d7/41746_2025_1670_Fig2_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验