Suppr超能文献

一种用于评估大型语言模型在医学文本摘要方面的临床安全性和幻觉率的框架。

A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation.

作者信息

Asgari Elham, Montaña-Brown Nina, Dubois Magda, Khalil Saleh, Balloch Jasmine, Yeung Joshua Au, Pimenta Dominic

机构信息

Tortus AI, London, UK.

Guy's and St Thomas NHS Trust, London, UK.

出版信息

NPJ Digit Med. 2025 May 13;8(1):274. doi: 10.1038/s41746-025-01670-7.

Abstract

Integrating large language models (LLMs) into healthcare can enhance workflow efficiency and patient care by automating tasks such as summarising consultations. However, the fidelity between LLM outputs and ground truth information is vital to prevent miscommunication that could lead to compromise in patient safety. We propose a framework comprising (1) an error taxonomy for classifying LLM outputs, (2) an experimental structure for iterative comparisons in our LLM document generation pipeline, (3) a clinical safety framework to evaluate the harms of errors, and (4) a graphical user interface, CREOLA, to facilitate these processes. Our clinical error metrics were derived from 18 experimental configurations involving LLMs for clinical note generation, consisting of 12,999 clinician-annotated sentences. We observed a 1.47% hallucination rate and a 3.45% omission rate. By refining prompts and workflows, we successfully reduced major errors below previously reported human note-taking rates, highlighting the framework's potential for safer clinical documentation.

摘要

将大语言模型(LLMs)整合到医疗保健中,可以通过自动执行诸如总结会诊等任务来提高工作流程效率和患者护理水平。然而,大语言模型输出与真实信息之间的保真度对于防止可能导致患者安全受损的沟通失误至关重要。我们提出了一个框架,包括(1)用于对大语言模型输出进行分类的错误分类法,(2)在我们的大语言模型文档生成管道中进行迭代比较的实验结构,(3)用于评估错误危害的临床安全框架,以及(4)一个图形用户界面CREOLA,以促进这些过程。我们的临床错误指标来自18种涉及用于生成临床记录的大语言模型的实验配置,包括12999条由临床医生注释的句子。我们观察到幻觉率为1.47%,遗漏率为3.45%。通过优化提示和工作流程,我们成功地将主要错误降低到低于先前报告的人工记录率,突出了该框架在更安全的临床文档记录方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c30f/12075489/a0eaa8b446d7/41746_2025_1670_Fig2_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验