Suppr超能文献

无行动学习:节省与行动相关的成本充当一种隐性奖励。

Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward.

作者信息

Tanimoto Sai, Kondo Masashi, Morita Kenji, Yoshida Eriko, Matsuzaki Masanori

机构信息

Department of Physiology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.

Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan.

出版信息

Front Behav Neurosci. 2020 Sep 4;14:141. doi: 10.3389/fnbeh.2020.00141. eCollection 2020.

Abstract

"To do or not to do" is a fundamental decision that has to be made in daily life. Behaviors related to multiple "to do" choice tasks have long been explained by reinforcement learning, and "to do or not to do" tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are real life cases in which an action/non-action is repeated, even though there is no obvious reward or punishment, because implicitly given outcomes such as saving physical energy and regret (we refer to this as "covert reward") can affect the decision-making. In the current task, mice chose to pull a lever or not according to two tone cues assigned with different water reward probabilities (70% and 30% in condition 1, and 30% and 10% in condition 2). As the mice learned, the probability that they would choose to pull the lever decreased (<0.25) in trials with a 30% reward probability cue (30% cue) in condition 1, and in trials with a 10% cue in condition 2, but increased (>0.8) in trials with a 70% cue in condition 1 and a 30% cue in condition 2, even though a non-pull was followed by neither an overt reward nor avoidance of overt punishment in any trial. This behavioral tendency was not well explained by a combination of commonly used Q-learning models, which take only the action choice with an overt reward outcome into account. Instead, we found that the non-action preference of the mice was best explained by Q-learning models, which regarded the non-action as the other choice, and updated non-action values with a covert reward. We propose that "doing nothing" can be actively chosen as an alternative to "doing something," and that a covert reward could serve as a reinforcer of "doing nothing."

摘要

“做还是不做”是日常生活中必须做出的一个基本决定。与多个“做”的选择任务相关的行为长期以来一直通过强化学习来解释,并且诸如“做或不做”任务(如“执行/不执行”任务)最近也在强化学习的框架内得到了讨论。在这个学习框架中,替代行动和/或采取的不行动是通过评估明确给出的(公开的)奖励和惩罚来确定的。然而,我们假设在现实生活中存在这样的情况,即即使没有明显的奖励或惩罚,一种行动/不行动仍会被重复,因为诸如节省体力和后悔等隐含给出的结果(我们将其称为“隐性奖励”)会影响决策。在当前任务中,小鼠根据分配有不同水奖励概率的两个音调提示(条件1中为70%和30%,条件2中为30%和10%)选择是否拉动杠杆。随着小鼠的学习,在条件1中30%奖励概率提示(30%提示)的试验中以及条件2中10%提示的试验中,它们选择拉动杠杆的概率降低(<0.25),但在条件1中70%提示的试验和条件2中30%提示的试验中,该概率增加(>0.8),尽管在任何试验中不拉动杠杆都不会伴随着公开奖励或避免公开惩罚。这种行为倾向无法通过仅考虑具有公开奖励结果的行动选择的常用Q学习模型的组合得到很好的解释。相反,我们发现小鼠的不行动偏好最好由将不行动视为另一种选择并使用隐性奖励更新不行动值的Q学习模型来解释。我们提出,“什么都不做”可以作为“做某事”的一种替代选择被积极地选择,并且隐性奖励可以作为“什么都不做”的强化物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/657915823f60/fnbeh-14-00141-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验