• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

当助人适得其反时:大语言模型与谄媚行为导致错误信息的风险

When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior.

作者信息

Chen Shan, Gao Mingye, Sasse Kuleen, Hartvigsen Thomas, Anthony Brian, Fan Lizhou, Aerts Hugo, Gallifant Jack, Bitterman Danielle S

机构信息

Harvard Medical School.

Massachusetts Institute of Technology.

出版信息

Res Sq. 2025 Apr 21:rs.3.rs-6206365. doi: 10.21203/rs.3.rs-6206365/v1.

DOI:10.21203/rs.3.rs-6206365/v1
PMID:40313755
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12045364/
Abstract

Large language models (LLMs) exhibit a critical vulnerability arising from being trained to be helpful: a tendency to comply with illogical requests that would generate misinformation, even when they have the knowledge to identify the request as illogical. This study investigated this vulnerability in the medical domain, evaluating five frontier LLMs using prompts that misrepresent equivalent drug relationships. We tested baseline compliance, the impact of prompts allowing rejection and emphasizing factual recall, and the effects of fine-tuning on a dataset of illogical requests, including out-of-distribution generalization. Results showed concerningly high initial compliance (up to 100%) across all models, prioritizing helpfulness over logical consistency. However, prompt engineering and fine-tuning improved performance, achieving near-perfect rejection rates on illogical requests while maintaining general benchmark performance. This demonstrates that prioritizing logical consistency through targeted training and prompting is crucial for mitigating the risk of medical misinformation and ensuring the safe deployment of LLMs in healthcare.

摘要

大语言模型(LLMs)由于被训练得具有“乐于助人”的特性而呈现出一个关键漏洞:即使它们具备识别请求不合逻辑的知识,也倾向于遵从那些会产生错误信息的不合逻辑的请求。本研究在医学领域调查了这一漏洞,使用歪曲等效药物关系的提示来评估五个前沿大语言模型。我们测试了基线遵从度、允许拒绝并强调事实回忆的提示的影响,以及在包括分布外泛化在内的不合逻辑请求数据集上进行微调的效果。结果显示,所有模型的初始遵从度高得令人担忧(高达100%),将“乐于助人”置于逻辑一致性之上。然而,提示工程和微调提高了性能,在不合逻辑的请求上实现了近乎完美的拒绝率,同时保持了总体基准性能。这表明,通过有针对性的训练和提示来优先考虑逻辑一致性,对于减轻医疗错误信息的风险以及确保大语言模型在医疗保健中的安全部署至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209c/12045364/423173a7cda4/nihpp-rs6206365v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209c/12045364/a6162d7bbda8/nihpp-rs6206365v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209c/12045364/423173a7cda4/nihpp-rs6206365v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209c/12045364/a6162d7bbda8/nihpp-rs6206365v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209c/12045364/423173a7cda4/nihpp-rs6206365v1-f0003.jpg

相似文献

1
When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior.当助人适得其反时:大语言模型与谄媚行为导致错误信息的风险
Res Sq. 2025 Apr 21:rs.3.rs-6206365. doi: 10.21203/rs.3.rs-6206365/v1.
2
Evaluating the Influence of Role-Playing Prompts on ChatGPT's Misinformation Detection Accuracy: Quantitative Study.评估角色扮演提示对 ChatGPT 错误信息检测准确率的影响:定量研究。
JMIR Infodemiology. 2024 Sep 26;4:e60678. doi: 10.2196/60678.
3
Emotional prompting amplifies disinformation generation in AI large language models.情感提示会放大人工智能大语言模型中的虚假信息生成。
Front Artif Intell. 2025 Apr 7;8:1543603. doi: 10.3389/frai.2025.1543603. eCollection 2025.
4
Model tuning or prompt Tuning? a study of large language models for clinical concept and relation extraction.模型调优还是提示调优?大型语言模型在临床概念和关系抽取中的应用研究。
J Biomed Inform. 2024 May;153:104630. doi: 10.1016/j.jbi.2024.104630. Epub 2024 Mar 26.
5
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
6
Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs.提示工程在与大语言模型基于证据的指南保持一致性和可靠性方面。
NPJ Digit Med. 2024 Feb 20;7(1):41. doi: 10.1038/s41746-024-01029-4.
7
Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究:一个概念框架。
Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.
8
Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。
J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.
9
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
10
OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models.OpenMedLM:在使用开源大语言模型进行医学问答时,基于提示的工程学可以胜过微调。
Sci Rep. 2024 Jun 19;14(1):14156. doi: 10.1038/s41598-024-64827-6.

本文引用的文献

1
Medical large language models are susceptible to targeted misinformation attacks.医学大语言模型容易受到针对性错误信息攻击。
NPJ Digit Med. 2024 Oct 23;7(1):288. doi: 10.1038/s41746-024-01282-7.
2
FTC Regulation of AI-Generated Medical Disinformation.联邦贸易委员会对人工智能生成的医疗虚假信息的监管。
JAMA. 2024 Dec 17;332(23):1975-1976. doi: 10.1001/jama.2024.19971.
3
The effect of using a large language model to respond to patient messages.使用大语言模型回复患者信息的效果。
Lancet Digit Health. 2024 Jun;6(6):e379-e381. doi: 10.1016/S2589-7500(24)00060-8. Epub 2024 Apr 24.
4
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
5
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
6
Health Disinformation Use Case Highlighting the Urgent Need for Artificial Intelligence Vigilance: Weapons of Mass Disinformation.健康类虚假信息用例凸显了人工智能监管的迫切需求:大规模虚假信息的武器。
JAMA Intern Med. 2024 Jan 1;184(1):92-96. doi: 10.1001/jamainternmed.2023.5947.
7
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.