Xiao Feng, Wang X T XiaoTian
Department of Applied Psychology, School of Humanities and Social Science, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Boulevard, 518172, Shenzhen, China.
Sci Rep. 2025 Sep 2;15(1):32290. doi: 10.1038/s41598-025-17188-7.
Recent advances in large language models (LLMs) have highlighted their potential to predict human decisions. In two studies, we compared predictions by GPT-3.5 and GPT-4 across 51 scenarios (9,600 responses) against published data from 2,104 human participants within an evolutionary-psychology framework. We further examined our findings with GPT-4o across eight social-group and kinship conditions (1,600 responses). Our results revealed behavioral differences between humans and LLMs' predictions: Humans showed a greater sensitivity to kinship and group size than the LLMs when making life-death decisions. LLMs align closer with humans with a higher risk-seeking preference in financial domains. While human choices followed Prospect theory's value function (risk-averse in gains, risk-seeking in losses), LLMs often predicted reversed patterns. GPT-3.5 matched the average level of human risk preference but showed reversed framing effects; GPT-4 was indiscriminately risk-averse across social contexts. While humans were more risk-seeking in small or kin groups than in large groups, GPT-4o made the opposite predictions. Our results suggest a set of criteria for a psychological version of the Turing Test reflected in framing effects and social context-dependent risk preference involving kinship, group size, social relations, sense of fairness, self-age awareness, public vs. personal properties, and social group-dependent aspiration levels.
大语言模型(LLMs)的最新进展凸显了其预测人类决策的潜力。在两项研究中,我们在进化心理学框架内,将GPT-3.5和GPT-4在51个场景(9600个回答)中的预测与来自2104名人类参与者的已发表数据进行了比较。我们还使用GPT-4o在八种社会群体和亲属关系条件下(1600个回答)对我们的发现进行了进一步研究。我们的结果揭示了人类与大语言模型预测之间的行为差异:在做出生死决策时,人类对亲属关系和群体规模的敏感度高于大语言模型。在金融领域,大语言模型与具有更高风险寻求偏好的人类更为一致。虽然人类的选择遵循前景理论的价值函数(收益时风险厌恶,损失时风险寻求),但大语言模型往往预测出相反的模式。GPT-3.5与人类风险偏好的平均水平相匹配,但表现出相反的框架效应;GPT-4在不同社会背景下不加区分地表现出风险厌恶。虽然人类在小群体或亲属群体中比在大群体中更倾向于冒险,但GPT-4o做出了相反的预测。我们的结果提出了一套心理版图灵测试的标准,体现在框架效应和社会背景依赖的风险偏好中,涉及亲属关系、群体规模、社会关系、公平感、自我年龄意识、公共与个人属性以及社会群体依赖的抱负水平。