Suppr超能文献

价值的不当行为与意志的自律。

The misbehavior of value and the discipline of the will.

作者信息

Dayan Peter, Niv Yael, Seymour Ben, Daw Nathaniel D

机构信息

Gatsby Computational Neuroscience Unit, UCL, 17 Queen Square, London, UK.

出版信息

Neural Netw. 2006 Oct;19(8):1153-60. doi: 10.1016/j.neunet.2006.03.002. Epub 2006 Aug 30.

Abstract

Most reinforcement learning models of animal conditioning operate under the convenient, though fictive, assumption that Pavlovian conditioning concerns prediction learning whereas instrumental conditioning concerns action learning. However, it is only through Pavlovian responses that Pavlovian prediction learning is evident, and these responses can act against the instrumental interests of the subjects. This can be seen in both experimental and natural circumstances. In this paper we study the consequences of importing this competition into a reinforcement learning context, and demonstrate the resulting effects in an omission schedule and a maze navigation task. The misbehavior created by Pavlovian values can be quite debilitating; we discuss how it may be disciplined.

摘要

大多数动物条件作用的强化学习模型都是在一个方便但虚构的假设下运行的,即经典条件作用涉及预测学习,而工具性条件作用涉及行动学习。然而,只有通过经典条件反应,经典预测学习才会显现出来,而这些反应可能会违背主体的工具性利益。这在实验和自然环境中都可以看到。在本文中,我们研究了将这种竞争引入强化学习环境的后果,并在遗漏任务和迷宫导航任务中展示了由此产生的效果。经典价值所产生的不当行为可能相当有害;我们讨论了如何对其进行约束。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验