评估大语言模型预测人类社会决策的能力。

Evaluating the ability of large Language models to predict human social decisions.

作者信息

Xiao Feng, Wang X T XiaoTian

机构信息

Department of Applied Psychology, School of Humanities and Social Science, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Boulevard, 518172, Shenzhen, China.

出版信息

Sci Rep. 2025 Sep 2;15(1):32290. doi: 10.1038/s41598-025-17188-7.

DOI:10.1038/s41598-025-17188-7

PMID:40897780

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12405550/

Abstract

Recent advances in large language models (LLMs) have highlighted their potential to predict human decisions. In two studies, we compared predictions by GPT-3.5 and GPT-4 across 51 scenarios (9,600 responses) against published data from 2,104 human participants within an evolutionary-psychology framework. We further examined our findings with GPT-4o across eight social-group and kinship conditions (1,600 responses). Our results revealed behavioral differences between humans and LLMs' predictions: Humans showed a greater sensitivity to kinship and group size than the LLMs when making life-death decisions. LLMs align closer with humans with a higher risk-seeking preference in financial domains. While human choices followed Prospect theory's value function (risk-averse in gains, risk-seeking in losses), LLMs often predicted reversed patterns. GPT-3.5 matched the average level of human risk preference but showed reversed framing effects; GPT-4 was indiscriminately risk-averse across social contexts. While humans were more risk-seeking in small or kin groups than in large groups, GPT-4o made the opposite predictions. Our results suggest a set of criteria for a psychological version of the Turing Test reflected in framing effects and social context-dependent risk preference involving kinship, group size, social relations, sense of fairness, self-age awareness, public vs. personal properties, and social group-dependent aspiration levels.

摘要

大语言模型（LLMs）的最新进展凸显了其预测人类决策的潜力。在两项研究中，我们在进化心理学框架内，将GPT-3.5和GPT-4在51个场景（9600个回答）中的预测与来自2104名人类参与者的已发表数据进行了比较。我们还使用GPT-4o在八种社会群体和亲属关系条件下（1600个回答）对我们的发现进行了进一步研究。我们的结果揭示了人类与大语言模型预测之间的行为差异：在做出生死决策时，人类对亲属关系和群体规模的敏感度高于大语言模型。在金融领域，大语言模型与具有更高风险寻求偏好的人类更为一致。虽然人类的选择遵循前景理论的价值函数（收益时风险厌恶，损失时风险寻求），但大语言模型往往预测出相反的模式。GPT-3.5与人类风险偏好的平均水平相匹配，但表现出相反的框架效应；GPT-4在不同社会背景下不加区分地表现出风险厌恶。虽然人类在小群体或亲属群体中比在大群体中更倾向于冒险，但GPT-4o做出了相反的预测。我们的结果提出了一套心理版图灵测试的标准，体现在框架效应和社会背景依赖的风险偏好中，涉及亲属关系、群体规模、社会关系、公平感、自我年龄意识、公共与个人属性以及社会群体依赖的抱负水平。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估大语言模型预测人类社会决策的能力。

Evaluating the ability of large Language models to predict human social decisions.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

评估大语言模型预测人类社会决策的能力。

Evaluating the ability of large Language models to predict human social decisions.

作者信息

机构信息

出版信息

相似文献

本文引用的文献