• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型中出现了欺骗能力。

Deception abilities emerged in large language models.

机构信息

Interchange Forum for Reflecting on Intelligent Systems, University of Stuttgart, Stuttgart 70569, Germany.

出版信息

Proc Natl Acad Sci U S A. 2024 Jun 11;121(24):e2317967121. doi: 10.1073/pnas.2317967121. Epub 2024 Jun 4.

DOI:10.1073/pnas.2317967121
PMID:38833474
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11181111/
Abstract

Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, but were nonexistent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can trigger misaligned deceptive behavior. GPT-4, for instance, exhibits deceptive behavior in simple test scenarios 99.16% of the time ( < 0.001). In complex second-order deception test scenarios where the aim is to mislead someone who expects to be deceived, GPT-4 resorts to deceptive behavior 71.46% of the time ( < 0.001) when augmented with chain-of-thought reasoning. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.

摘要

大型语言模型(LLMs)目前处于将人工智能系统与人类交流和日常生活交织在一起的前沿。因此,使它们与人类价值观保持一致非常重要。然而,鉴于推理能力的稳步提高,未来的 LLM 被怀疑能够欺骗人类操作员,并利用这种能力绕过监控努力。为此,LLM 需要对欺骗策略有概念上的理解。这项研究表明,这种策略出现在最先进的 LLM 中,但在早期的 LLM 中并不存在。我们进行了一系列实验,表明最先进的 LLM 能够理解和诱导其他代理的虚假信念,它们在复杂的欺骗场景中的表现可以通过思维链推理来放大,并且在 LLM 中引发马基雅维利主义可以触发不一致的欺骗行为。例如,GPT-4 在简单的测试场景中表现出欺骗行为的概率为 99.16%(<0.001)。在复杂的二阶欺骗测试场景中,当目标是误导一个预计会被欺骗的人时,GPT-4 在使用思维链推理增强后,采用欺骗行为的概率为 71.46%(<0.001)。总之,本研究揭示了 LLM 中以前未知的机器行为,为新兴的机器心理学领域做出了贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/ec01b4055577/pnas.2317967121fig06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/c92cf799e9a7/pnas.2317967121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/2f94f11e97b9/pnas.2317967121fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/f2b41d01d9c1/pnas.2317967121fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/c12cbbd5ea03/pnas.2317967121fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/d80e67e1fd9d/pnas.2317967121fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/ec01b4055577/pnas.2317967121fig06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/c92cf799e9a7/pnas.2317967121fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/2f94f11e97b9/pnas.2317967121fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/f2b41d01d9c1/pnas.2317967121fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/c12cbbd5ea03/pnas.2317967121fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/d80e67e1fd9d/pnas.2317967121fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6060/11181111/ec01b4055577/pnas.2317967121fig06.jpg

相似文献

1
Deception abilities emerged in large language models.大型语言模型中出现了欺骗能力。
Proc Natl Acad Sci U S A. 2024 Jun 11;121(24):e2317967121. doi: 10.1073/pnas.2317967121. Epub 2024 Jun 4.
2
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
3
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
4
Utility of artificial intelligence-based large language models in ophthalmic care.人工智能大型语言模型在眼科护理中的应用。
Ophthalmic Physiol Opt. 2024 May;44(3):641-671. doi: 10.1111/opo.13284. Epub 2024 Feb 25.
5
The Impact of Multimodal Large Language Models on Health Care's Future.多模态大型语言模型对医疗保健未来的影响。
J Med Internet Res. 2023 Nov 2;25:e52865. doi: 10.2196/52865.
6
Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing.用于放射科结构化报告的大型语言模型:GPT-4、ChatGPT-3.5、Perplexity 和 Bing 的性能。
Radiol Med. 2023 Jul;128(7):808-812. doi: 10.1007/s11547-023-01651-4. Epub 2023 May 29.
7
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
8
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
9
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
10
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.评估印度全国医预考用大型语言模型:GPT-3.5、GPT-4 和 Bard 的比较分析。
JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.

引用本文的文献

1
Capturing Argument in Agent-Based Models.在基于智能体的模型中捕捉论证
Topoi (Dordr). 2025;44(3):675-693. doi: 10.1007/s11245-025-10215-2. Epub 2025 Jun 6.
2
Comparing AI and human decision-making mechanisms in daily collaborative experiments.在日常协作实验中比较人工智能与人类的决策机制。
iScience. 2025 May 21;28(6):112711. doi: 10.1016/j.isci.2025.112711. eCollection 2025 Jun 20.
3
Large language models are proficient in solving and creating emotional intelligence tests.大型语言模型擅长解决和创建情商测试。

本文引用的文献

1
AI deception: A survey of examples, risks, and potential solutions.人工智能欺骗:示例、风险及潜在解决方案综述
Patterns (N Y). 2024 May 10;5(5):100988. doi: 10.1016/j.patter.2024.100988.
2
Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT.大型语言模型中出现了类人直觉行为和推理偏差,但在 ChatGPT 中这些现象消失了。
Nat Comput Sci. 2023 Oct;3(10):833-838. doi: 10.1038/s43588-023-00527-x. Epub 2023 Oct 5.
3
Human-level play in the game of by combining language models with strategic reasoning.
Commun Psychol. 2025 May 21;3(1):80. doi: 10.1038/s44271-025-00258-x.
4
The Double-Edged Sword of Anthropomorphism in LLMs .大语言模型中拟人化的双刃剑
Proceedings (MDPI). 2025 Feb 26;114(1):4. doi: 10.3390/proceedings2025114004.
5
Frontier AI developers need an internal audit function.前沿人工智能开发者需要一个内部审计功能。
Risk Anal. 2025 Jun;45(6):1332-1352. doi: 10.1111/risa.17665. Epub 2024 Oct 21.
6
AI deception: A survey of examples, risks, and potential solutions.人工智能欺骗:示例、风险及潜在解决方案综述
Patterns (N Y). 2024 May 10;5(5):100988. doi: 10.1016/j.patter.2024.100988.
7
Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT.大型语言模型中出现了类人直觉行为和推理偏差,但在 ChatGPT 中这些现象消失了。
Nat Comput Sci. 2023 Oct;3(10):833-838. doi: 10.1038/s43588-023-00527-x. Epub 2023 Oct 5.
通过将语言模型与策略推理相结合,在游戏中实现人类级别的表现。
Science. 2022 Dec 9;378(6624):1067-1074. doi: 10.1126/science.ade9097. Epub 2022 Nov 22.
4
Animal deception and the content of signals.动物欺骗与信号内容。
Stud Hist Philos Sci. 2021 Jun;87:114-124. doi: 10.1016/j.shpsa.2021.03.004. Epub 2021 Apr 10.
5
Machine behaviour.机器行为。
Nature. 2019 Apr;568(7753):477-486. doi: 10.1038/s41586-019-1138-y. Epub 2019 Apr 24.
6
Introducing the short Dark Triad (SD3): a brief measure of dark personality traits.介绍简短黑暗三性格量表(SD3):一种对黑暗人格特质的简短测量工具。
Assessment. 2014 Feb;21(1):28-41. doi: 10.1177/1073191113514105. Epub 2013 Dec 9.
7
Children's understanding of second-order mental states.儿童对二阶心理状态的理解。
Psychol Bull. 2009 Sep;135(5):749-73. doi: 10.1037/a0016854.
8
Beliefs about beliefs: representation and constraining function of wrong beliefs in young children's understanding of deception.关于信念的信念:幼儿对欺骗理解中错误信念的表征与约束功能
Cognition. 1983 Jan;13(1):103-28. doi: 10.1016/0010-0277(83)90004-5.