Huang Yijie, Chen Yanhong
School of Information Technology and Artificial Intelligence, Zhejiang University of Finance & Economics, Hangzhou 310018, China.
Chaos. 2025 Apr 1;35(4). doi: 10.1063/5.0267846.
Reinforcement learning technology has been empirically demonstrated to facilitate cooperation in game models. However, traditional research has primarily focused on two-strategy frameworks (cooperation and defection), which inadequately captures the complexity of real-world scenarios. To address this limitation, we integrated Q-learning into the prisoner's dilemma game, incorporating three strategies: cooperation, defection, and going it alone. We defined each agent's state based on the number of neighboring agents opting for cooperation and included social payoff in the Q-table update process. Numerical simulations indicate that this framework significantly enhances cooperation and average payoff as the degree of social-attention increases. This phenomenon occurs because social payoff enables individuals to move beyond narrow self-interest and consider broader social benefits. Additionally, we conducted a thorough analysis of the mechanisms underlying this enhancement of cooperation.
强化学习技术已通过实证证明有助于博弈模型中的合作。然而,传统研究主要集中在双策略框架(合作与背叛)上,这不足以捕捉现实世界场景的复杂性。为了解决这一局限性,我们将Q学习集成到囚徒困境博弈中,纳入了三种策略:合作、背叛和单干。我们根据选择合作的相邻智能体数量定义每个智能体的状态,并在Q表更新过程中纳入社会收益。数值模拟表明,随着社会关注度的提高,该框架显著增强了合作和平均收益。出现这种现象是因为社会收益使个体能够超越狭隘的自身利益,考虑更广泛的社会利益。此外,我们对这种合作增强背后的机制进行了深入分析。