Dayan Peter, Niv Yael, Seymour Ben, Daw Nathaniel D
Gatsby Computational Neuroscience Unit, UCL, 17 Queen Square, London, UK.
Neural Netw. 2006 Oct;19(8):1153-60. doi: 10.1016/j.neunet.2006.03.002. Epub 2006 Aug 30.
Most reinforcement learning models of animal conditioning operate under the convenient, though fictive, assumption that Pavlovian conditioning concerns prediction learning whereas instrumental conditioning concerns action learning. However, it is only through Pavlovian responses that Pavlovian prediction learning is evident, and these responses can act against the instrumental interests of the subjects. This can be seen in both experimental and natural circumstances. In this paper we study the consequences of importing this competition into a reinforcement learning context, and demonstrate the resulting effects in an omission schedule and a maze navigation task. The misbehavior created by Pavlovian values can be quite debilitating; we discuss how it may be disciplined.
大多数动物条件作用的强化学习模型都是在一个方便但虚构的假设下运行的,即经典条件作用涉及预测学习,而工具性条件作用涉及行动学习。然而,只有通过经典条件反应,经典预测学习才会显现出来,而这些反应可能会违背主体的工具性利益。这在实验和自然环境中都可以看到。在本文中,我们研究了将这种竞争引入强化学习环境的后果,并在遗漏任务和迷宫导航任务中展示了由此产生的效果。经典价值所产生的不当行为可能相当有害;我们讨论了如何对其进行约束。