Center for Humans and Machines, Max Planck Institute for Human Development, Lentzeallee 94, 14195, Berlin, Germany.
Max Planck School of Cognition, Leipzig, Germany.
Sci Rep. 2024 Sep 27;14(1):22274. doi: 10.1038/s41598-024-73306-x.
Pre-trained large language models (LLMs) have garnered significant attention for their ability to generate human-like textand responses across various domains. This study delves into examines the social and strategic behavior of the commonly used LLM GPT-3.5 by investigating its suggestions in well-established behavioral economics paradigms. Specifically, we focus on socialpreferences, including altruism, reciprocity, and fairness, in the context of two classic economic games: the Dictator Game(DG) and the Ultimatum Game (UG). Our research aims to answer three overarching questions: (1) To what extent do GPT-3.5suggestions reflect human social preferences? (2) How do socio-demographic features of the advisee and (3) technicalparameters of the model influence the suggestions of GPT-3.5? We present detailed empirical evidence from extensiveexperiments with GPT-3.5, analyzing its responses to various game scenarios while manipulating the demographics of theadvisee and the model temperature. Our findings reveal that, in the DG Dictator Game, model suggestions are more altruistic than in humans.We further show that it also picks up on more subtle aspects of human social preferences: fairness and reciprocity. Thisresearch contributes to the ongoing exploration of AI-driven systems' alignment with human behavior and social norms,providing valuable insights into the behavior of pre-trained LLMs and their implications for human-AI interactions.Additionally, our study offers a methodological benchmark for future research examining human-like characteristics andbehaviors in language models.
预先训练的大型语言模型(LLMs)因其能够在各种领域生成类似人类的文本和回复而引起了广泛关注。本研究深入研究了常用的 LLM GPT-3.5 的社会和策略行为,通过调查其在既定的行为经济学范式中的建议来研究这一点。具体来说,我们专注于社会偏好,包括利他主义、互惠和公平,在两个经典的经济游戏中:独裁者游戏(DG)和最后通牒游戏(UG)。我们的研究旨在回答三个总体问题:(1)GPT-3.5 的建议在多大程度上反映了人类的社会偏好?(2)被建议者的社会人口特征和(3)模型的技术参数如何影响 GPT-3.5 的建议?我们通过与 GPT-3.5 进行广泛的实验提出了详细的经验证据,分析了其对各种游戏场景的反应,同时操纵了被建议者的人口统计数据和模型温度。我们的研究结果表明,在 DG 独裁者游戏中,模型的建议比人类更利他。我们进一步表明,它还能捕捉到人类社会偏好的更微妙方面:公平和互惠。这项研究有助于探索人工智能驱动的系统与人类行为和社会规范的一致性,为预先训练的 LLM 的行为及其对人机交互的影响提供了有价值的见解。此外,我们的研究为未来研究提供了一种方法学基准,用于研究语言模型中的类人特征和行为。