Brooklyn College and Graduate Center, CUNY, New York, NY, USA.
Stockholm University, Stockholm, Sweden.
Psychon Bull Rev. 2020 Dec;27(6):1166-1194. doi: 10.3758/s13423-020-01749-0.
We present a new mathematical formulation of associative learning focused on non-human animals, which we call A-learning. Building on current animal learning theory and machine learning, A-learning is composed of two learning equations, one for stimulus-response values and one for stimulus values (conditioned reinforcement). A third equation implements decision-making by mapping stimulus-response values to response probabilities. We show that A-learning can reproduce the main features of: instrumental acquisition, including the effects of signaled and unsignaled non-contingent reinforcement; Pavlovian acquisition, including higher-order conditioning, omission training, autoshaping, and differences in form between conditioned and unconditioned responses; acquisition of avoidance responses; acquisition and extinction of instrumental chains and Pavlovian higher-order conditioning; Pavlovian-to-instrumental transfer; Pavlovian and instrumental outcome revaluation effects, including insight into why these effects vary greatly with training procedures and with the proximity of a response to the reinforcer. We discuss the differences between current theory and A-learning, such as its lack of stimulus-stimulus and response-stimulus associations, and compare A-learning with other temporal-difference models from machine learning, such as Q-learning, SARSA, and the actor-critic model. We conclude that A-learning may offer a more convenient view of associative learning than current mathematical models, and point out areas that need further development.
我们提出了一种新的关注非人类动物的联想学习的数学公式,我们称之为 A 学习。基于当前的动物学习理论和机器学习,A 学习由两个学习方程组成,一个用于刺激-反应值,一个用于刺激值(条件强化)。第三个方程通过将刺激-反应值映射到反应概率来实现决策。我们表明,A 学习可以再现以下主要特征:
工具性获取,包括信号和非信号非偶然强化的影响;
巴甫洛夫式获取,包括高级条件作用、省略训练、自动塑造以及条件反应和非条件反应之间的形式差异;
回避反应的获取;
工具性连锁和巴甫洛夫高级条件作用的获取和消退;
巴甫洛夫到工具性的转移;
巴甫洛夫和工具性结果再评价效应,包括深入了解为什么这些效应与训练程序以及反应与强化物的接近程度有很大差异。
我们讨论了当前理论与 A 学习之间的差异,例如它缺乏刺激-刺激和反应-刺激的关联,并将 A 学习与机器学习中的其他时间差分模型(如 Q 学习、SARSA 和行为-评价模型)进行了比较。我们得出结论,与当前的数学模型相比,A 学习可能提供了一种更方便的联想学习观点,并指出了需要进一步发展的领域。