School of Mathematics, University of Bristol, Bristol, United Kingdom.
School of Biological Sciences, University of Bristol, Bristol, United Kingdom.
PLoS One. 2021 Feb 5;16(2):e0246588. doi: 10.1371/journal.pone.0246588. eCollection 2021.
We focus on learning during development in a group of individuals that play a competitive game with each other. The game has two actions and there is negative frequency dependence. We define the distribution of actions by group members to be an equilibrium configuration if no individual can improve its payoff by unilaterally changing its action. We show that at this equilibrium, one action is preferred in the sense that those taking the preferred action have a higher payoff than those taking the other, more prosocial, action. We explore the consequences of a simple 'unbiased' reinforcement learning rule during development, showing that groups reach an approximate equilibrium distribution, so that some achieve a higher payoff than others. Because there is learning, an individual's behaviour can influence the future behaviour of others. We show that, as a consequence, there is the potential for an individual to exploit others by influencing them to be the ones to take the non-preferred action. Using an evolutionary simulation, we show that population members can avoid being exploited by over-valuing rewards obtained from the preferred option during learning, an example of a bias that is 'rational'.
我们专注于在一群相互竞争的个体中学习发展。游戏有两个动作,存在负频率依赖。如果没有个体可以通过单方面改变其动作来提高其收益,则我们将成员的动作分布定义为均衡配置。我们表明,在这种均衡下,一种动作更受欢迎,因为采取首选动作的人比采取另一种更有利于社会的动作的人获得更高的回报。我们探讨了在发展过程中简单的“无偏”强化学习规则的后果,表明群体达到了近似的均衡分布,因此一些人比其他人获得了更高的回报。由于存在学习,个体的行为可以影响他人的未来行为。我们表明,因此,个体有可能通过影响他人采取非首选动作来剥削他人。使用进化模拟,我们表明,在学习过程中,成员可以通过高估从首选选项获得的奖励来避免被剥削,这是一种“理性”的偏见。