具有迭代囚徒困境中动态期望水平的强化学习模型的数值分析。

Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma.

机构信息

Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo 113-8656, Japan.

出版信息

J Theor Biol. 2011 Jun 7;278(1):55-62. doi: 10.1016/j.jtbi.2011.03.005. Epub 2011 Mar 29.

DOI:10.1016/j.jtbi.2011.03.005

Abstract

Humans and other animals can adapt their social behavior in response to environmental cues including the feedback obtained through experience. Nevertheless, the effects of the experience-based learning of players in evolution and maintenance of cooperation in social dilemma games remain relatively unclear. Some previous literature showed that mutual cooperation of learning players is difficult or requires a sophisticated learning model. In the context of the iterated Prisoner's dilemma, we numerically examine the performance of a reinforcement learning model. Our model modifies those of Karandikar et al. (1998), Posch et al. (1999), and Macy and Flache (2002) in which players satisfy if the obtained payoff is larger than a dynamic threshold. We show that players obeying the modified learning mutually cooperate with high probability if the dynamics of threshold is not too fast and the association between the reinforcement signal and the action in the next round is sufficiently strong. The learning players also perform efficiently against the reactive strategy. In evolutionary dynamics, they can invade a population of players adopting simpler but competitive strategies. Our version of the reinforcement learning model does not complicate the previous model and is sufficiently simple yet flexible. It may serve to explore the relationships between learning and evolution in social dilemma situations.

摘要

人类和其他动物可以根据环境线索（包括通过经验获得的反馈）来调整其社交行为。然而，玩家基于经验的学习在进化中的作用以及在社会困境游戏中合作的维持仍然相对不清楚。一些先前的文献表明，学习型玩家的相互合作是困难的，或者需要复杂的学习模型。在重复囚徒困境的背景下，我们通过数值方法研究了强化学习模型的性能。我们的模型对 Karandikar 等人（1998 年）、Posch 等人（1999 年）和 Macy 和 Flache（2002 年）的模型进行了修改，在这些模型中，如果获得的回报大于动态阈值，玩家就会得到满足。我们表明，如果阈值的动态变化不是太快，并且下一轮强化信号与动作之间的关联足够强，那么遵循修改后的学习规则的玩家很可能会高度合作。学习型玩家也可以有效地对抗反应性策略。在进化动力学中，他们可以入侵采用简单但具有竞争力策略的玩家群体。我们的强化学习模型版本不会使之前的模型复杂化，它足够简单但灵活。它可以用于探索学习和社会困境情况下的进化之间的关系。

相似文献

Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma.

J Theor Biol. 2011 Jun 7;278(1):55-62. doi: 10.1016/j.jtbi.2011.03.005. Epub 2011 Mar 29.

A theoretical analysis of temporal difference learning in the iterated prisoner's dilemma game.

Bull Math Biol. 2009 Nov;71(8):1818-50. doi: 10.1007/s11538-009-9424-8. Epub 2009 May 29.

Evolution of cooperation facilitated by reinforcement learning with adaptive aspiration levels.

J Theor Biol. 2012 Jan 21;293:151-60. doi: 10.1016/j.jtbi.2011.10.020. Epub 2011 Oct 25.

A strategy with novel evolutionary features for the iterated prisoner's dilemma.

Evol Comput. 2009 Summer;17(2):257-74. doi: 10.1162/evco.2009.17.2.257.

Evolutionary escape from the prisoner's dilemma.

J Theor Biol. 2007 Apr 7;245(3):411-22. doi: 10.1016/j.jtbi.2006.10.011. Epub 2006 Oct 18.

Evolutionary dynamics of the continuous iterated prisoner's dilemma.

J Theor Biol. 2007 Mar 21;245(2):258-67. doi: 10.1016/j.jtbi.2006.09.016. Epub 2006 Sep 20.

The continuous prisoner's dilemma and the evolution of cooperation through reciprocal altruism with variable investment.

Am Nat. 2002 Oct;160(4):421-38. doi: 10.1086/342070.

Optimality under noise: higher memory strategies for the alternating prisoner's dilemma.

J Theor Biol. 2001 Jul 21;211(2):159-80. doi: 10.1006/jtbi.2001.2337.

Are there really no evolutionarily stable strategies in the iterated prisoner's dilemma?

J Theor Biol. 2002 Jan 21;214(2):155-69. doi: 10.1006/jtbi.2001.2455.

Win-stay, lose-shift strategies for repeated games-memory length, aspiration levels and noise.

J Theor Biol. 1999 May 21;198(2):183-95. doi: 10.1006/jtbi.1999.0909.

引用本文的文献

Intrinsic fluctuations of reinforcement learning promote cooperation.

Sci Rep. 2023 Jan 24;13(1):1309. doi: 10.1038/s41598-023-27672-7.

Reinforcement learning account of network reciprocity.

PLoS One. 2017 Dec 8;12(12):e0189220. doi: 10.1371/journal.pone.0189220. eCollection 2017.

Reinforcement learning accounts for moody conditional cooperation behavior: experimental results.

Sci Rep. 2017 Jan 10;7:39275. doi: 10.1038/srep39275.

Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin.

PLoS Comput Biol. 2016 Jul 20;12(7):e1005034. doi: 10.1371/journal.pcbi.1005034. eCollection 2016 Jul.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

具有迭代囚徒困境中动态期望水平的强化学习模型的数值分析。

Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma.

机构信息

Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo 113-8656, Japan.

出版信息

J Theor Biol. 2011 Jun 7;278(1):55-62. doi: 10.1016/j.jtbi.2011.03.005. Epub 2011 Mar 29.

DOI:10.1016/j.jtbi.2011.03.005

PMID:21397610

Abstract

摘要

具有迭代囚徒困境中动态期望水平的强化学习模型的数值分析。

Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

具有迭代囚徒困境中动态期望水平的强化学习模型的数值分析。

Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated Prisoner's dilemma.

机构信息

出版信息

相似文献

引用本文的文献