大语言模型的策略行为以及博弈结构与情境框架的作用。

Strategic behavior of large language models and the role of game structure versus contextual framing.

作者信息

Lorè Nunzio, Heydari Babak

机构信息

Multi-Agent Intelligent Complex Systems (MAGICS) Lab, Network Science Institute, Northeastern University, Boston, MA, USA.

Multi-Agent Intelligent Complex Systems (MAGICS) Lab, College of Engineering and Network Science Institute, Northeastern University, Boston, MA, USA.

出版信息

Sci Rep. 2024 Aug 9;14(1):18490. doi: 10.1038/s41598-024-69032-z.

DOI:10.1038/s41598-024-69032-z

PMID:39122801

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11316122/

Abstract

This paper investigates the strategic behavior of large language models (LLMs) across various game-theoretic settings, scrutinizing the interplay between game structure and contextual framing in decision-making. We focus our analysis on three advanced LLMs-GPT-3.5, GPT-4, and LLaMa-2-and how they navigate both the intrinsic aspects of different games and the nuances of their surrounding contexts. Our results highlight discernible patterns in each model's strategic approach. GPT-3.5 shows significant sensitivity to context but lags in its capacity for abstract strategic decision making. Conversely, both GPT-4 and LLaMa-2 demonstrate a more balanced sensitivity to game structures and contexts, albeit with crucial differences. Specifically, GPT-4 prioritizes the internal mechanics of the game over its contextual backdrop but does so with only a coarse differentiation among game types. In contrast, LLaMa-2 reflects a more granular understanding of individual game structures, while also giving due weight to contextual elements. This suggests that LLaMa-2 is better equipped to navigate the subtleties of different strategic scenarios while also incorporating context into its decision-making, whereas GPT-4 adopts a more generalized, structure-centric strategy.

摘要

本文研究了大语言模型（LLMs）在各种博弈论场景中的策略行为，审视了决策过程中博弈结构与情境框架之间的相互作用。我们将分析重点放在三个先进的大语言模型——GPT-3.5、GPT-4和LLaMa-2上，以及它们如何应对不同博弈的内在方面及其周围环境的细微差别。我们的研究结果突出了每个模型策略方法中可辨别的模式。GPT-3.5对情境表现出显著的敏感性，但其抽象战略决策能力滞后。相反，GPT-4和LLaMa-2对博弈结构和情境都表现出更平衡的敏感性，尽管存在关键差异。具体而言，GPT-4将博弈的内部机制置于情境背景之上，但在不同博弈类型之间的区分较为粗略。相比之下，LLaMa-2对个体博弈结构有更细致的理解，同时也充分考虑了情境因素。这表明LLaMa-2更有能力应对不同战略场景的微妙之处，同时将情境纳入其决策过程，而GPT-4则采用了更通用的、以结构为中心的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab21/11316122/e22017f817e3/41598_2024_69032_Fig1_HTML.jpg

相似文献

Strategic behavior of large language models and the role of game structure versus contextual framing.

Sci Rep. 2024 Aug 9;14(1):18490. doi: 10.1038/s41598-024-69032-z.

GPT-3.5 altruistic advice is sensitive to reciprocal concerns but not to strategic risk.

Sci Rep. 2024 Sep 27;14(1):22274. doi: 10.1038/s41598-024-73306-x.

Effect of Private Deliberation: Deception of Large Language Models in Game Play.

Entropy (Basel). 2024 Jun 18;26(6):524. doi: 10.3390/e26060524.

Can large language models understand molecules?

BMC Bioinformatics. 2024 Jun 26;25(1):225. doi: 10.1186/s12859-024-05847-x.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2.

Int J Surg. 2024 Apr 1;110(4):1941-1950. doi: 10.1097/JS9.0000000000001066.

Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: A head-to-head cross-sectional study.

PLOS Digit Health. 2024 Apr 17;3(4):e0000341. doi: 10.1371/journal.pdig.0000341. eCollection 2024 Apr.

Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists.

Psychiatry Clin Neurosci. 2024 Jun;78(6):347-352. doi: 10.1111/pcn.13656. Epub 2024 Feb 26.

Comparison of Prompt Engineering and Fine-Tuning Strategies in Large Language Models in the Classification of Clinical Notes.

medRxiv. 2024 Feb 8:2024.02.07.24302444. doi: 10.1101/2024.02.07.24302444.

引用本文的文献

Comparing AI and human decision-making mechanisms in daily collaborative experiments.

iScience. 2025 May 21;28(6):112711. doi: 10.1016/j.isci.2025.112711. eCollection 2025 Jun 20.

Static network structure cannot stabilize cooperation among large language model agents.

PLoS One. 2025 May 22;20(5):e0320094. doi: 10.1371/journal.pone.0320094. eCollection 2025.

本文引用的文献

Testing theory of mind in large language models and humans.

Nat Hum Behav. 2024 Jul;8(7):1285-1295. doi: 10.1038/s41562-024-01882-z. Epub 2024 May 20.

GPT-4 passes the bar exam.

Philos Trans A Math Phys Eng Sci. 2024 Apr 15;382(2270):20230254. doi: 10.1098/rsta.2023.0254. Epub 2024 Feb 26.

A Turing test of whether AI chatbots are behaviorally similar to humans.

Proc Natl Acad Sci U S A. 2024 Feb 27;121(9):e2313925121. doi: 10.1073/pnas.2313925121. Epub 2024 Feb 22.

Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT.

Nat Comput Sci. 2023 Oct;3(10):833-838. doi: 10.1038/s43588-023-00527-x. Epub 2023 Oct 5.

The emergence of economic rationality of GPT.

Proc Natl Acad Sci U S A. 2023 Dec 19;120(51):e2316205120. doi: 10.1073/pnas.2316205120. Epub 2023 Dec 12.

Artificial intelligence takes center stage: exploring the capabilities and implications of ChatGPT and other AI-assisted technologies in scientific research and education.

Immunol Cell Biol. 2023 Nov-Dec;101(10):923-935. doi: 10.1111/imcb.12689. Epub 2023 Sep 18.

Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.

Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination.

Eye (Lond). 2023 Dec;37(17):3694-3695. doi: 10.1038/s41433-023-02564-2. Epub 2023 May 8.

Mine or ours? Neural basis of the exploitation of common-pool resources.

Soc Cogn Affect Neurosci. 2022 Sep 1;17(9):837-849. doi: 10.1093/scan/nsac008.

Network Modularity is essential for evolution of cooperation under uncertainty.

Sci Rep. 2015 Apr 7;5:9340. doi: 10.1038/srep09340.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大语言模型的策略行为以及博弈结构与情境框架的作用。

Strategic behavior of large language models and the role of game structure versus contextual framing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献