University of Bamberg, Markusplatz 3, D-96047, Bamberg, Germany.
Technische Universitaet Braunschweig, Spielmannstrasse 19, D-38106, Braunschweig, Germany.
Behav Processes. 2021 May;186:104370. doi: 10.1016/j.beproc.2021.104370. Epub 2021 Feb 26.
Reinforcement learning is often described by analogy to natural selection. However, there is no coherent theory relating reinforcement learning to evolution within a single formal model of selection. This paper provides the formal foundation of such a unified theory. The model is based on the most general description of natural selection as given by the Price equation. We extend the Price equation to cover reinforcement learning as the result of a behavioral selection process within individuals and relate it to the principle of natural selection via the concept of statistical fitness predictors by means of a multilevel model of behavioral selection. The main result is the covariance-based law of effect, which describes reinforcement learning on a molar level by means of the covariance between behavioral allocation and a statistical fitness predictor. We further demonstrate how this abstract principle can be applied to derive theoretical explanations of various empirical findings, like conditioned reinforcement, blocking, matching and response deprivation. Our model is the first to apply the abstract principle of selection to derive a unified description of reinforcement learning and natural selection within a single model. It provides a general analytical tool for behavioral psychology in a similar way that the theory of natural selection does for evolutionary biology. We thus lay the formal foundation of a general theory of reinforcement as the result of behavioral selection on multiple levels.
强化学习常被类比于自然选择。然而,在单一的选择形式模型中,并没有将强化学习与进化联系起来的连贯理论。本文为这样的统一理论提供了形式基础。该模型基于由 Price 方程给出的对自然选择的最一般描述。我们将 Price 方程扩展到涵盖强化学习,作为个体内部行为选择过程的结果,并通过行为选择的多层次模型,通过统计适应度预测器的概念,将其与自然选择原则联系起来。主要结果是基于协方差的效应定律,该定律通过行为分配与统计适应度预测器之间的协方差来描述宏观层面的强化学习。我们进一步展示了如何将这一抽象原则应用于推导出各种经验发现的理论解释,如条件强化、阻断、匹配和反应剥夺。我们的模型是第一个将选择的抽象原则应用于从单一模型中推导出强化学习和自然选择的统一描述的模型。它为行为心理学提供了一个通用的分析工具,就像自然选择理论为进化生物学提供的一样。因此,我们为在多个层次上进行行为选择的强化的一般理论奠定了形式基础。