• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型预测人类社会决策的能力。

Evaluating the ability of large Language models to predict human social decisions.

作者信息

Xiao Feng, Wang X T XiaoTian

机构信息

Department of Applied Psychology, School of Humanities and Social Science, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Boulevard, 518172, Shenzhen, China.

出版信息

Sci Rep. 2025 Sep 2;15(1):32290. doi: 10.1038/s41598-025-17188-7.

DOI:10.1038/s41598-025-17188-7
PMID:40897780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12405550/
Abstract

Recent advances in large language models (LLMs) have highlighted their potential to predict human decisions. In two studies, we compared predictions by GPT-3.5 and GPT-4 across 51 scenarios (9,600 responses) against published data from 2,104 human participants within an evolutionary-psychology framework. We further examined our findings with GPT-4o across eight social-group and kinship conditions (1,600 responses). Our results revealed behavioral differences between humans and LLMs' predictions: Humans showed a greater sensitivity to kinship and group size than the LLMs when making life-death decisions. LLMs align closer with humans with a higher risk-seeking preference in financial domains. While human choices followed Prospect theory's value function (risk-averse in gains, risk-seeking in losses), LLMs often predicted reversed patterns. GPT-3.5 matched the average level of human risk preference but showed reversed framing effects; GPT-4 was indiscriminately risk-averse across social contexts. While humans were more risk-seeking in small or kin groups than in large groups, GPT-4o made the opposite predictions. Our results suggest a set of criteria for a psychological version of the Turing Test reflected in framing effects and social context-dependent risk preference involving kinship, group size, social relations, sense of fairness, self-age awareness, public vs. personal properties, and social group-dependent aspiration levels.

摘要

大语言模型(LLMs)的最新进展凸显了其预测人类决策的潜力。在两项研究中,我们在进化心理学框架内,将GPT-3.5和GPT-4在51个场景(9600个回答)中的预测与来自2104名人类参与者的已发表数据进行了比较。我们还使用GPT-4o在八种社会群体和亲属关系条件下(1600个回答)对我们的发现进行了进一步研究。我们的结果揭示了人类与大语言模型预测之间的行为差异:在做出生死决策时,人类对亲属关系和群体规模的敏感度高于大语言模型。在金融领域,大语言模型与具有更高风险寻求偏好的人类更为一致。虽然人类的选择遵循前景理论的价值函数(收益时风险厌恶,损失时风险寻求),但大语言模型往往预测出相反的模式。GPT-3.5与人类风险偏好的平均水平相匹配,但表现出相反的框架效应;GPT-4在不同社会背景下不加区分地表现出风险厌恶。虽然人类在小群体或亲属群体中比在大群体中更倾向于冒险,但GPT-4o做出了相反的预测。我们的结果提出了一套心理版图灵测试的标准,体现在框架效应和社会背景依赖的风险偏好中,涉及亲属关系、群体规模、社会关系、公平感、自我年龄意识、公共与个人属性以及社会群体依赖的抱负水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/9d017593137d/41598_2025_17188_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/60b495b406f9/41598_2025_17188_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/952106a5de44/41598_2025_17188_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/9d017593137d/41598_2025_17188_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/60b495b406f9/41598_2025_17188_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/952106a5de44/41598_2025_17188_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a4d/12405550/9d017593137d/41598_2025_17188_Fig3_HTML.jpg

相似文献

1
Evaluating the ability of large Language models to predict human social decisions.评估大语言模型预测人类社会决策的能力。
Sci Rep. 2025 Sep 2;15(1):32290. doi: 10.1038/s41598-025-17188-7.
2
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
3
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.大型语言模型在数值与语义医学知识方面的表现:基于循证问答的横断面基准研究
J Med Internet Res. 2025 Jul 14;27:e64452. doi: 10.2196/64452.
4
Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.利用大语言模型对合成及真实世界社交媒体上有关结膜炎爆发的帖子中的流行病学特征进行分类:信息流行病学研究
J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.
5
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
6
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
7
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.大型语言模型在为癌症幸存者及其护理人员量身定制教育内容方面的评估:质量分析
JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.
8
Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human Performance.评估用于增强放射学专业考试的大语言模型:与人类表现的对比研究。
Acad Radiol. 2025 May 27. doi: 10.1016/j.acra.2025.05.023.
9
A publicly available benchmark for assessing large language models' ability to predict how humans balance self-interest and the interest of others.一个公开可用的基准,用于评估大语言模型预测人类如何平衡自身利益和他人利益的能力。
Sci Rep. 2025 Jul 1;15(1):21428. doi: 10.1038/s41598-025-01715-7.
10
Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study.使用人工智能驱动的大语言模型对患者投诉进行分类:横断面研究
J Med Internet Res. 2025 Aug 6;27:e74231. doi: 10.2196/74231.

本文引用的文献

1
A publicly available benchmark for assessing large language models' ability to predict how humans balance self-interest and the interest of others.一个公开可用的基准,用于评估大语言模型预测人类如何平衡自身利益和他人利益的能力。
Sci Rep. 2025 Jul 1;15(1):21428. doi: 10.1038/s41598-025-01715-7.
2
Kernels of selfhood: GPT-4o shows humanlike patterns of cognitive dissonance moderated by free choice.自我内核:GPT-4o展现出由自由选择调节的类似人类的认知失调模式。
Proc Natl Acad Sci U S A. 2025 May 20;122(20):e2501823122. doi: 10.1073/pnas.2501823122. Epub 2025 May 14.
3
Playing repeated games with large language models.
与大语言模型进行重复博弈。
Nat Hum Behav. 2025 May 8. doi: 10.1038/s41562-025-02172-y.
4
GPT-3.5 altruistic advice is sensitive to reciprocal concerns but not to strategic risk.GPT-3.5 的利他主义建议对互惠性关注敏感,但对策略性风险不敏感。
Sci Rep. 2024 Sep 27;14(1):22274. doi: 10.1038/s41598-024-73306-x.
5
Language is primarily a tool for communication rather than thought.语言主要是一种交流工具,而不是思维工具。
Nature. 2024 Jun;630(8017):575-586. doi: 10.1038/s41586-024-07522-w. Epub 2024 Jun 19.
6
Testing theory of mind in large language models and humans.测试大语言模型和人类的心理理论。
Nat Hum Behav. 2024 Jul;8(7):1285-1295. doi: 10.1038/s41562-024-01882-z. Epub 2024 May 20.
7
Risk and prosocial behavioural cues elicit human-like response patterns from AI chatbots.风险和亲社会行为线索会引起 AI 聊天机器人类似人类的反应模式。
Sci Rep. 2024 Mar 26;14(1):7095. doi: 10.1038/s41598-024-55949-y.
8
Language-based game theory in the age of artificial intelligence.人工智能时代基于语言的博弈论。
J R Soc Interface. 2024 Mar;21(212):20230720. doi: 10.1098/rsif.2023.0720. Epub 2024 Mar 13.
9
Do large language models show decision heuristics similar to humans? A case study using GPT-3.5.大型语言模型是否表现出与人类相似的决策启发式?使用 GPT-3.5 的案例研究。
J Exp Psychol Gen. 2024 Apr;153(4):1066-1075. doi: 10.1037/xge0001547. Epub 2024 Feb 8.
10
Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT.大型语言模型中出现了类人直觉行为和推理偏差,但在 ChatGPT 中这些现象消失了。
Nat Comput Sci. 2023 Oct;3(10):833-838. doi: 10.1038/s43588-023-00527-x. Epub 2023 Oct 5.