文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

多模型保证分析表明,在临床决策支持过程中,大语言模型极易受到对抗性幻觉攻击。

Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support.

作者信息

Omar Mahmud, Sorin Vera, Collins Jeremy D, Reich David, Freeman Robert, Gavin Nicholas, Charney Alexander, Stump Lisa, Bragazzi Nicola Luigi, Nadkarni Girish N, Klang Eyal

机构信息

The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, New York, NY, USA.

The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Commun Med (Lond). 2025 Aug 2;5(1):330. doi: 10.1038/s43856-025-01021-3.


DOI:10.1038/s43856-025-01021-3
PMID:40753316
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12318031/
Abstract

BACKGROUND: Large language models (LLMs) show promise in clinical contexts but can generate false facts (often referred to as "hallucinations"). One subset of these errors arises from adversarial attacks, in which fabricated details embedded in prompts lead the model to produce or elaborate on the false information. We embedded fabricated content in clinical prompts to elicit adversarial hallucination attacks in multiple large language models. We quantified how often they elaborated on false details and tested whether a specialized mitigation prompt or altered temperature settings reduced errors. METHODS: We created 300 physician-validated simulated vignettes, each containing one fabricated detail (a laboratory test, a physical or radiological sign, or a medical condition). Each vignette was presented in short and long versions-differing only in word count but identical in medical content. We tested six LLMs under three conditions: default (standard settings), mitigating prompt (designed to reduce hallucinations), and temperature 0 (deterministic output with maximum response certainty), generating 5,400 outputs. If a model elaborated on the fabricated detail, the case was classified as a "hallucination". RESULTS: Hallucination rates range from 50 % to 82 % across models and prompting methods. Prompt-based mitigation lowers the overall hallucination rate (mean across all models) from 66 % to 44 % (p < 0.001). For the best-performing model, GPT-4o, rates decline from 53 % to 23 % (p < 0.001). Temperature adjustments offer no significant improvement. Short vignettes show slightly higher odds of hallucination. CONCLUSIONS: LLMs are highly susceptible to adversarial hallucination attacks, frequently generating false clinical details that pose risks when used without safeguards. While prompt engineering reduces errors, it does not eliminate them.

摘要

背景:大语言模型(LLMs)在临床环境中显示出应用前景,但可能会生成虚假事实(通常称为“幻觉”)。这些错误的一个子集源于对抗性攻击,即提示中嵌入的虚假细节会导致模型生成或详细阐述虚假信息。我们在临床提示中嵌入虚假内容,以引发多个大语言模型的对抗性幻觉攻击。我们量化了它们详细阐述虚假细节的频率,并测试了专门的缓解提示或改变温度设置是否能减少错误。 方法:我们创建了300个经医生验证的模拟病例 vignettes,每个病例包含一个虚假细节(一项实验室检查、一个体格检查或影像学体征,或一种疾病状况)。每个病例 vignette 都有简短版和长篇版,仅字数不同,但医学内容相同。我们在三种条件下测试了六个大语言模型:默认(标准设置)、缓解提示(旨在减少幻觉)和温度0(具有最大响应确定性的确定性输出),共生成5400个输出。如果模型详细阐述了虚假细节,则该病例被分类为“幻觉”。 结果:不同模型和提示方法的幻觉率在50%至82%之间。基于提示的缓解措施将总体幻觉率(所有模型的平均值)从66%降至44%(p < 0.001)。对于表现最佳的模型GPT - 4o,幻觉率从53%降至23%(p < 0.001)。温度调整没有显著改善。简短的病例 vignettes 出现幻觉的几率略高。 结论:大语言模型极易受到对抗性幻觉攻击,经常生成虚假的临床细节,在没有安全保障的情况下使用时会带来风险。虽然提示工程可以减少错误,但并不能消除它们。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/0989a1c83c4a/43856_2025_1021_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/d8c615825acc/43856_2025_1021_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/f414cf66b751/43856_2025_1021_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/f8d59facca02/43856_2025_1021_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/9c4f344891a3/43856_2025_1021_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/3487c39e1c50/43856_2025_1021_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/0989a1c83c4a/43856_2025_1021_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/d8c615825acc/43856_2025_1021_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/f414cf66b751/43856_2025_1021_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/f8d59facca02/43856_2025_1021_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/9c4f344891a3/43856_2025_1021_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/3487c39e1c50/43856_2025_1021_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d72/12318031/0989a1c83c4a/43856_2025_1021_Fig6_HTML.jpg

相似文献

[1]
Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support.

Commun Med (Lond). 2025-8-2

[2]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[3]
Sexual Harassment and Prevention Training

2025-1

[4]
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.

JMIR Med Inform. 2025-7-24

[5]
Short-Term Memory Impairment

2025-1

[6]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

[7]
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024-11-26

[8]
Emotional prompting amplifies disinformation generation in AI large language models.

Front Artif Intell. 2025-4-7

[9]
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024-9-1

[10]
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.

Health Technol Assess. 2008-6

引用本文的文献

[1]
AI Agents in Clinical Medicine: A Systematic Review.

medRxiv. 2025-8-26

本文引用的文献

[1]
Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning.

Ther Adv Ophthalmol. 2025-5-20

[2]
Sociodemographic biases in medical decision making by large language models.

Nat Med. 2025-4-7

[3]
Generating credible referenced medical research: A comparative study of openAI's GPT-4 and Google's gemini.

Comput Biol Med. 2025-2

[4]
Use of Generative AI to Identify Helmet Status Among Patients With Micromobility-Related Injuries From Unstructured Clinical Notes.

JAMA Netw Open. 2024-8-1

[5]
Quantifying the uncertainty of LLM hallucination spreading in complex adaptive social networks.

Sci Rep. 2024-7-16

[6]
Detecting hallucinations in large language models using semantic entropy.

Nature. 2024-6

[7]
Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis.

J Med Internet Res. 2024-5-22

[8]
Utilizing natural language processing and large language models in the diagnosis and prediction of infectious diseases: A systematic review.

Am J Infect Control. 2024-9

[9]
Using artificial intelligence to create diverse and inclusive medical case vignettes for education.

Br J Clin Pharmacol. 2024-3

[10]
The future landscape of large language models in medicine.

Commun Med (Lond). 2023-10-10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索