Department of Mathematics, King's College London, Strand, London, WC2R 2LS, United Kingdom.
Theoretical Physics, School of Physics and Astronomy, The University of Manchester, Manchester M13 9PL, United Kingdom.
Sci Rep. 2017 Jan 18;7:40580. doi: 10.1038/srep40580.
It is known that learning of players who interact in a repeated game can be interpreted as an evolutionary process in a population of ideas. These analogies have so far mostly been established in deterministic models, and memory loss in learning has been seen to act similarly to mutation in evolution. We here propose a representation of reinforcement learning as a stochastic process in finite 'populations of ideas'. The resulting birth-death dynamics has absorbing states and allows for the extinction or fixation of ideas, marking a key difference to mutation-selection processes in finite populations. We characterize the outcome of evolution in populations of ideas for several classes of symmetric and asymmetric games.
众所周知,在重复博弈中互动的参与者的学习可以被解释为思想群体中的进化过程。到目前为止,这些类比主要是在确定性模型中建立的,并且学习中的记忆丧失被视为类似于进化中的突变。我们在这里提出了将强化学习表示为有限“思想群体”中的随机过程。由此产生的生死动力学具有吸收态,并允许思想的灭绝或固定,这与有限群体中的突变选择过程有很大的不同。我们为几类对称和非对称博弈的思想群体中的进化结果进行了特征描述。