Suppr超能文献

多模型保证分析表明,在临床决策支持过程中,大语言模型极易受到对抗性幻觉攻击。

Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support.

作者信息

Omar Mahmud, Sorin Vera, Collins Jeremy D, Reich David, Freeman Robert, Gavin Nicholas, Charney Alexander, Stump Lisa, Bragazzi Nicola Luigi, Nadkarni Girish N, Klang Eyal

机构信息

The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, New York, NY, USA.

The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Commun Med (Lond). 2025 Aug 2;5(1):330. doi: 10.1038/s43856-025-01021-3.

Abstract

BACKGROUND

Large language models (LLMs) show promise in clinical contexts but can generate false facts (often referred to as "hallucinations"). One subset of these errors arises from adversarial attacks, in which fabricated details embedded in prompts lead the model to produce or elaborate on the false information. We embedded fabricated content in clinical prompts to elicit adversarial hallucination attacks in multiple large language models. We quantified how often they elaborated on false details and tested whether a specialized mitigation prompt or altered temperature settings reduced errors.

METHODS

We created 300 physician-validated simulated vignettes, each containing one fabricated detail (a laboratory test, a physical or radiological sign, or a medical condition). Each vignette was presented in short and long versions-differing only in word count but identical in medical content. We tested six LLMs under three conditions: default (standard settings), mitigating prompt (designed to reduce hallucinations), and temperature 0 (deterministic output with maximum response certainty), generating 5,400 outputs. If a model elaborated on the fabricated detail, the case was classified as a "hallucination".

RESULTS

Hallucination rates range from 50 % to 82 % across models and prompting methods. Prompt-based mitigation lowers the overall hallucination rate (mean across all models) from 66 % to 44 % (p < 0.001). For the best-performing model, GPT-4o, rates decline from 53 % to 23 % (p < 0.001). Temperature adjustments offer no significant improvement. Short vignettes show slightly higher odds of hallucination.

CONCLUSIONS

LLMs are highly susceptible to adversarial hallucination attacks, frequently generating false clinical details that pose risks when used without safeguards. While prompt engineering reduces errors, it does not eliminate them.

摘要

背景

大语言模型(LLMs)在临床环境中显示出应用前景,但可能会生成虚假事实(通常称为“幻觉”)。这些错误的一个子集源于对抗性攻击,即提示中嵌入的虚假细节会导致模型生成或详细阐述虚假信息。我们在临床提示中嵌入虚假内容,以引发多个大语言模型的对抗性幻觉攻击。我们量化了它们详细阐述虚假细节的频率,并测试了专门的缓解提示或改变温度设置是否能减少错误。

方法

我们创建了300个经医生验证的模拟病例 vignettes,每个病例包含一个虚假细节(一项实验室检查、一个体格检查或影像学体征,或一种疾病状况)。每个病例 vignette 都有简短版和长篇版,仅字数不同,但医学内容相同。我们在三种条件下测试了六个大语言模型:默认(标准设置)、缓解提示(旨在减少幻觉)和温度0(具有最大响应确定性的确定性输出),共生成5400个输出。如果模型详细阐述了虚假细节,则该病例被分类为“幻觉”。

结果

不同模型和提示方法的幻觉率在50%至82%之间。基于提示的缓解措施将总体幻觉率(所有模型的平均值)从66%降至44%(p < 0.001)。对于表现最佳的模型GPT - 4o,幻觉率从53%降至23%(p < 0.001)。温度调整没有显著改善。简短的病例 vignettes 出现幻觉的几率略高。

结论

大语言模型极易受到对抗性幻觉攻击,经常生成虚假的临床细节,在没有安全保障的情况下使用时会带来风险。虽然提示工程可以减少错误,但并不能消除它们。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/d8c615825acc/43856_2025_1021_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验