Suppr超能文献

当助人适得其反时:大语言模型与谄媚行为导致错误信息的风险

When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior.

作者信息

Chen Shan, Gao Mingye, Sasse Kuleen, Hartvigsen Thomas, Anthony Brian, Fan Lizhou, Aerts Hugo, Gallifant Jack, Bitterman Danielle S

机构信息

Harvard Medical School.

Massachusetts Institute of Technology.

出版信息

Res Sq. 2025 Apr 21:rs.3.rs-6206365. doi: 10.21203/rs.3.rs-6206365/v1.

Abstract

Large language models (LLMs) exhibit a critical vulnerability arising from being trained to be helpful: a tendency to comply with illogical requests that would generate misinformation, even when they have the knowledge to identify the request as illogical. This study investigated this vulnerability in the medical domain, evaluating five frontier LLMs using prompts that misrepresent equivalent drug relationships. We tested baseline compliance, the impact of prompts allowing rejection and emphasizing factual recall, and the effects of fine-tuning on a dataset of illogical requests, including out-of-distribution generalization. Results showed concerningly high initial compliance (up to 100%) across all models, prioritizing helpfulness over logical consistency. However, prompt engineering and fine-tuning improved performance, achieving near-perfect rejection rates on illogical requests while maintaining general benchmark performance. This demonstrates that prioritizing logical consistency through targeted training and prompting is crucial for mitigating the risk of medical misinformation and ensuring the safe deployment of LLMs in healthcare.

摘要

大语言模型(LLMs)由于被训练得具有“乐于助人”的特性而呈现出一个关键漏洞:即使它们具备识别请求不合逻辑的知识,也倾向于遵从那些会产生错误信息的不合逻辑的请求。本研究在医学领域调查了这一漏洞,使用歪曲等效药物关系的提示来评估五个前沿大语言模型。我们测试了基线遵从度、允许拒绝并强调事实回忆的提示的影响,以及在包括分布外泛化在内的不合逻辑请求数据集上进行微调的效果。结果显示,所有模型的初始遵从度高得令人担忧(高达100%),将“乐于助人”置于逻辑一致性之上。然而,提示工程和微调提高了性能,在不合逻辑的请求上实现了近乎完美的拒绝率,同时保持了总体基准性能。这表明,通过有针对性的训练和提示来优先考虑逻辑一致性,对于减轻医疗错误信息的风险以及确保大语言模型在医疗保健中的安全部署至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209c/12045364/a6162d7bbda8/nihpp-rs6206365v1-f0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验