• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用认知心理学理解 GPT-3。

Using cognitive psychology to understand GPT-3.

机构信息

Max Planck Research Group (MPRG) Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Tübingen 72076, Germany.

出版信息

Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.

DOI:10.1073/pnas.2218523120
PMID:36730192
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9963545/
Abstract

We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: It solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multiarmed bandit task, and shows signatures of model-based reinforcement learning. Yet, we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. Taken together, these results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.

摘要

我们使用认知心理学的工具研究了 GPT-3,这是一种最近出现的大型语言模型。更具体地说,我们在一系列来自文献的典型实验上评估了 GPT-3 的决策、信息搜索、思考和因果推理能力。我们发现 GPT-3 的很多行为令人印象深刻:它在基于情境的任务上的表现与人类受试者相似或更好,能够根据描述做出不错的决策,在多臂赌博任务中表现优于人类,并且表现出基于模型的强化学习的特征。然而,我们也发现,对基于情境的任务进行微小的干扰会导致 GPT-3出现严重偏差,它没有表现出定向探索的特征,并且在因果推理任务中表现得非常糟糕。总的来说,这些结果丰富了我们对当前大型语言模型的理解,并为未来使用认知心理学工具研究越来越有能力和不透明的人工智能代理铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/92b731f73272/pnas.2218523120fig06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/1cd6158c3690/pnas.2218523120fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/dc008a74c259/pnas.2218523120fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/53544846d5ae/pnas.2218523120fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/61f3f7385d77/pnas.2218523120fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/c8ccad9cb28c/pnas.2218523120fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/92b731f73272/pnas.2218523120fig06.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/1cd6158c3690/pnas.2218523120fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/dc008a74c259/pnas.2218523120fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/53544846d5ae/pnas.2218523120fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/61f3f7385d77/pnas.2218523120fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/c8ccad9cb28c/pnas.2218523120fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/92b731f73272/pnas.2218523120fig06.jpg

相似文献

1
Using cognitive psychology to understand GPT-3.利用认知心理学理解 GPT-3。
Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.
2
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性:一项观察性研究。
Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.
3
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
4
Language models and psychological sciences.语言模型与心理科学。
Front Psychol. 2023 Oct 20;14:1279317. doi: 10.3389/fpsyg.2023.1279317. eCollection 2023.
5
Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans.在 GPT-3 中,意义重叠比人类更能预测语义激活。
Sci Rep. 2023 Mar 28;13(1):5035. doi: 10.1038/s41598-023-32248-6.
6
Covariation of learning and "reasoning" abilities in mice: evolutionary conservation of the operations of intelligence.小鼠学习与“推理”能力的协变:智力运作的进化保守性
J Exp Psychol Anim Behav Process. 2012 Apr;38(2):109-24. doi: 10.1037/a0027355. Epub 2012 Mar 19.
7
Multi-task reinforcement learning in humans.人类的多任务强化学习。
Nat Hum Behav. 2021 Jun;5(6):764-773. doi: 10.1038/s41562-020-01035-y. Epub 2021 Jan 28.
8
Diagnostic accuracy of large language models in psychiatry.精神科大语言模型的诊断准确性。
Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.
9
Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task.情景记忆支配选择:基于 RNN 的强化学习模型在决策任务中的应用。
Neural Netw. 2021 Feb;134:1-10. doi: 10.1016/j.neunet.2020.11.003. Epub 2020 Nov 18.
10
Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.使用大型语言模型对临床综述进行自动化论文筛选:数据分析研究。
J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.

引用本文的文献

1
Active use of latent tree-structured sentence representation in humans and large language models.人类和大语言模型中潜在树状结构句子表征的积极应用。
Nat Hum Behav. 2025 Sep 10. doi: 10.1038/s41562-025-02297-0.
2
GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels.GPT-4V在现象学和神经层面展现出类人的社会感知能力。
Imaging Neurosci (Camb). 2025 Sep 2;3. doi: 10.1162/IMAG.a.134. eCollection 2025.
3
Can Large Language Models Simulate Spoken Human Conversations?大语言模型能模拟人类对话吗?

本文引用的文献

1
Do Large Language Models Know What Humans Know?大语言模型了解人类的知识吗?
Cogn Sci. 2023 Jul;47(7):e13309. doi: 10.1111/cogs.13309.
2
Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models.用于通过大语言模型进行即席任务适配的交互式和可视化提示工程
IEEE Trans Vis Comput Graph. 2023 Jan;29(1):1146-1156. doi: 10.1109/TVCG.2022.3209479. Epub 2022 Dec 16.
3
A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.
Cogn Sci. 2025 Sep;49(9):e70106. doi: 10.1111/cogs.70106.
4
The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation.生成式人工智能中的创造力悖论:高性能、类人偏见与有限的差异评估。
Front Psychol. 2025 Aug 7;16:1628486. doi: 10.3389/fpsyg.2025.1628486. eCollection 2025.
5
Capturing Argument in Agent-Based Models.在基于智能体的模型中捕捉论证
Topoi (Dordr). 2025;44(3):675-693. doi: 10.1007/s11245-025-10215-2. Epub 2025 Jun 6.
6
Testing for completions that simulate altruism in early language models.测试早期语言模型中模拟利他主义的完成情况。
Nat Hum Behav. 2025 Jul 28. doi: 10.1038/s41562-025-02258-7.
7
A foundation model to predict and capture human cognition.一种用于预测和捕捉人类认知的基础模型。
Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09215-4.
8
Cultural tendencies in generative AI.生成式人工智能中的文化倾向。
Nat Hum Behav. 2025 Jun 20. doi: 10.1038/s41562-025-02242-1.
9
Large language models show amplified cognitive biases in moral decision-making.大语言模型在道德决策中表现出放大的认知偏差。
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2412015122. doi: 10.1073/pnas.2412015122. Epub 2025 Jun 20.
10
Examining Chat GPT with nonwords and machine psycholinguistic techniques.运用非词和机器心理语言学技术对ChatGPT进行研究。
PLoS One. 2025 Jun 6;20(6):e0325612. doi: 10.1371/journal.pone.0325612. eCollection 2025.
神经网络通过程序综合和少量学习以人类水平解决、解释和生成大学数学问题。
Proc Natl Acad Sci U S A. 2022 Aug 9;119(32):e2123433119. doi: 10.1073/pnas.2123433119. Epub 2022 Aug 2.
4
Using large-scale experiments and machine learning to discover theories of human decision-making.利用大规模实验和机器学习发现人类决策理论。
Science. 2021 Jun 11;372(6547):1209-1214. doi: 10.1126/science.abe2629.
5
Computational Psychiatry for Computers.计算机的计算精神病学
iScience. 2020 Nov 7;23(12):101772. doi: 10.1016/j.isci.2020.101772. eCollection 2020 Dec 18.
6
XAI-Explainable artificial intelligence.可解释人工智能
Sci Robot. 2019 Dec 18;4(37). doi: 10.1126/scirobotics.aay7120.
7
Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models.将语言置于综合理解系统中:迈向神经语言模型达到人类水平性能的下一步。
Proc Natl Acad Sci U S A. 2020 Oct 20;117(42):25966-25974. doi: 10.1073/pnas.1910416117. Epub 2020 Sep 28.
8
Humans primarily use model-based inference in the two-stage task.人类主要在两阶段任务中使用基于模型的推理。
Nat Hum Behav. 2020 Oct;4(10):1053-1066. doi: 10.1038/s41562-020-0905-y. Epub 2020 Jul 6.
9
Replicating patterns of prospect theory for decision under risk.复制风险决策下前景理论的模式。
Nat Hum Behav. 2020 Jun;4(6):622-633. doi: 10.1038/s41562-020-0886-x. Epub 2020 May 18.
10
Machine behaviour.机器行为。
Nature. 2019 Apr;568(7753):477-486. doi: 10.1038/s41586-019-1138-y. Epub 2019 Apr 24.