• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无行动学习:节省与行动相关的成本充当一种隐性奖励。

Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward.

作者信息

Tanimoto Sai, Kondo Masashi, Morita Kenji, Yoshida Eriko, Matsuzaki Masanori

机构信息

Department of Physiology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.

Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan.

出版信息

Front Behav Neurosci. 2020 Sep 4;14:141. doi: 10.3389/fnbeh.2020.00141. eCollection 2020.

DOI:10.3389/fnbeh.2020.00141
PMID:33100979
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7498735/
Abstract

"To do or not to do" is a fundamental decision that has to be made in daily life. Behaviors related to multiple "to do" choice tasks have long been explained by reinforcement learning, and "to do or not to do" tasks such as the go/no-go task have also been recently discussed within the framework of reinforcement learning. In this learning framework, alternative actions and/or the non-action to take are determined by evaluating explicitly given (overt) reward and punishment. However, we assume that there are real life cases in which an action/non-action is repeated, even though there is no obvious reward or punishment, because implicitly given outcomes such as saving physical energy and regret (we refer to this as "covert reward") can affect the decision-making. In the current task, mice chose to pull a lever or not according to two tone cues assigned with different water reward probabilities (70% and 30% in condition 1, and 30% and 10% in condition 2). As the mice learned, the probability that they would choose to pull the lever decreased (<0.25) in trials with a 30% reward probability cue (30% cue) in condition 1, and in trials with a 10% cue in condition 2, but increased (>0.8) in trials with a 70% cue in condition 1 and a 30% cue in condition 2, even though a non-pull was followed by neither an overt reward nor avoidance of overt punishment in any trial. This behavioral tendency was not well explained by a combination of commonly used Q-learning models, which take only the action choice with an overt reward outcome into account. Instead, we found that the non-action preference of the mice was best explained by Q-learning models, which regarded the non-action as the other choice, and updated non-action values with a covert reward. We propose that "doing nothing" can be actively chosen as an alternative to "doing something," and that a covert reward could serve as a reinforcer of "doing nothing."

摘要

“做还是不做”是日常生活中必须做出的一个基本决定。与多个“做”的选择任务相关的行为长期以来一直通过强化学习来解释,并且诸如“做或不做”任务(如“执行/不执行”任务)最近也在强化学习的框架内得到了讨论。在这个学习框架中,替代行动和/或采取的不行动是通过评估明确给出的(公开的)奖励和惩罚来确定的。然而,我们假设在现实生活中存在这样的情况,即即使没有明显的奖励或惩罚,一种行动/不行动仍会被重复,因为诸如节省体力和后悔等隐含给出的结果(我们将其称为“隐性奖励”)会影响决策。在当前任务中,小鼠根据分配有不同水奖励概率的两个音调提示(条件1中为70%和30%,条件2中为30%和10%)选择是否拉动杠杆。随着小鼠的学习,在条件1中30%奖励概率提示(30%提示)的试验中以及条件2中10%提示的试验中,它们选择拉动杠杆的概率降低(<0.25),但在条件1中70%提示的试验和条件2中30%提示的试验中,该概率增加(>0.8),尽管在任何试验中不拉动杠杆都不会伴随着公开奖励或避免公开惩罚。这种行为倾向无法通过仅考虑具有公开奖励结果的行动选择的常用Q学习模型的组合得到很好的解释。相反,我们发现小鼠的不行动偏好最好由将不行动视为另一种选择并使用隐性奖励更新不行动值的Q学习模型来解释。我们提出,“什么都不做”可以作为“做某事”的一种替代选择被积极地选择,并且隐性奖励可以作为“什么都不做”的强化物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/302ca52b5387/fnbeh-14-00141-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/657915823f60/fnbeh-14-00141-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/f872b94f260d/fnbeh-14-00141-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/8d334f31a496/fnbeh-14-00141-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/6c68f7bf963d/fnbeh-14-00141-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/a5f234690eaf/fnbeh-14-00141-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/2157765254e5/fnbeh-14-00141-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/9973cd36ab38/fnbeh-14-00141-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/9934b5515911/fnbeh-14-00141-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/302ca52b5387/fnbeh-14-00141-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/657915823f60/fnbeh-14-00141-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/f872b94f260d/fnbeh-14-00141-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/8d334f31a496/fnbeh-14-00141-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/6c68f7bf963d/fnbeh-14-00141-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/a5f234690eaf/fnbeh-14-00141-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/2157765254e5/fnbeh-14-00141-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/9973cd36ab38/fnbeh-14-00141-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/9934b5515911/fnbeh-14-00141-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f55/7498735/302ca52b5387/fnbeh-14-00141-g009.jpg

相似文献

1
Non-action Learning: Saving Action-Associated Cost Serves as a Covert Reward.无行动学习:节省与行动相关的成本充当一种隐性奖励。
Front Behav Neurosci. 2020 Sep 4;14:141. doi: 10.3389/fnbeh.2020.00141. eCollection 2020.
2
Reward and avoidance learning in the context of aversive environments and possible implications for depressive symptoms.在厌恶环境背景下的奖励和回避学习及其对抑郁症状的可能影响。
Psychopharmacology (Berl). 2019 Aug;236(8):2437-2449. doi: 10.1007/s00213-019-05299-9. Epub 2019 Jun 28.
3
The Computational Development of Reinforcement Learning during Adolescence.青少年时期强化学习的计算发展
PLoS Comput Biol. 2016 Jun 20;12(6):e1004953. doi: 10.1371/journal.pcbi.1004953. eCollection 2016 Jun.
4
Role of Anterior Cingulate Cortex in Instrumental Learning: Blockade of Dopamine D1 Receptors Suppresses Overt but Not Covert Learning.前扣带回皮质在工具性学习中的作用:多巴胺D1受体阻断抑制外显学习而非内隐学习。
Front Behav Neurosci. 2017 May 15;11:82. doi: 10.3389/fnbeh.2017.00082. eCollection 2017.
5
The influence of trial order on learning from reward vs. punishment in a probabilistic categorization task: experimental and computational analyses.概率分类任务中试验顺序对从奖励与惩罚中学习的影响:实验与计算分析
Front Behav Neurosci. 2015 Jul 24;9:153. doi: 10.3389/fnbeh.2015.00153. eCollection 2015.
6
Reward and punishment act as distinct factors in guiding behavior.奖励和惩罚在引导行为方面起着截然不同的作用。
Cognition. 2015 Jun;139:154-67. doi: 10.1016/j.cognition.2015.03.005. Epub 2015 Mar 28.
7
Credit Assignment in a Motor Decision Making Task Is Influenced by Agency and Not Sensory Prediction Errors.在一项运动决策任务中,信用分配受机构影响,而不受感官预测误差影响。
J Neurosci. 2018 May 9;38(19):4521-4530. doi: 10.1523/JNEUROSCI.3601-17.2018. Epub 2018 Apr 12.
8
Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats.动作值估计中的不确定性会影响大鼠的动作选择和选择行为的学习率。
Eur J Neurosci. 2012 Apr;35(7):1180-9. doi: 10.1111/j.1460-9568.2012.08025.x.
9
Easy to learn, hard to suppress: The impact of learned stimulus-outcome associations on subsequent action control.易学难抑:习得的刺激-结果关联对后续动作控制的影响。
Brain Cogn. 2015 Dec;101:17-34. doi: 10.1016/j.bandc.2015.10.007. Epub 2015 Nov 8.
10
Modular deep reinforcement learning from reward and punishment for robot navigation.基于奖惩的机器人导航模块化深度强化学习。
Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.

引用本文的文献

1
Whether or not to act is determined by distinct signals from motor thalamus and orbitofrontal cortex to secondary motor cortex.是否采取行动由来自运动丘脑和眶额皮质至次级运动皮质的不同信号决定。
Nat Commun. 2025 Apr 4;16(1):3106. doi: 10.1038/s41467-025-58272-w.
2
Medial prefrontal cortex suppresses reward-seeking behavior with risk of punishment by reducing sensitivity to reward.内侧前额叶皮层通过降低对奖励的敏感性来抑制具有惩罚风险的寻求奖励行为。
Front Neurosci. 2024 Jun 5;18:1412509. doi: 10.3389/fnins.2024.1412509. eCollection 2024.

本文引用的文献

1
Stable Representations of Decision Variables for Flexible Behavior.决策变量的稳定表示法,以实现灵活的行为。
Neuron. 2019 Sep 4;103(5):922-933.e7. doi: 10.1016/j.neuron.2019.06.001. Epub 2019 Jul 4.
2
Cortex-wide neural interfacing via transparent polymer skulls.通过透明聚合物颅骨进行皮质范围的神经接口。
Nat Commun. 2019 Apr 2;10(1):1500. doi: 10.1038/s41467-019-09488-0.
3
Super-wide-field two-photon imaging with a micro-optical device moving in post-objective space.使用在物镜后空间移动的微光学器件进行超宽场双光子成像。
Nat Commun. 2018 Sep 3;9(1):3550. doi: 10.1038/s41467-018-06058-8.
4
Thalamocortical Axonal Activity in Motor Cortex Exhibits Layer-Specific Dynamics during Motor Learning.运动学习过程中运动皮层丘脑皮质轴突活动表现出层特异性动态变化。
Neuron. 2018 Oct 10;100(1):244-258.e12. doi: 10.1016/j.neuron.2018.08.016. Epub 2018 Aug 30.
5
Area-specific Modulation of Functional Cortical Activity During Block-based and Trial-based Proactive Inhibition.基于区的和基于试次的主动性抑制过程中功能皮质活动的区域特异性调节。
Neuroscience. 2018 Sep 15;388:297-316. doi: 10.1016/j.neuroscience.2018.07.039. Epub 2018 Aug 3.
6
Mice learn to avoid regret.老鼠学会了避免后悔。
PLoS Biol. 2018 Jun 21;16(6):e2005853. doi: 10.1371/journal.pbio.2005853. eCollection 2018 Jun.
7
Cross-Task Contributions of Frontobasal Ganglia Circuitry in Response Inhibition and Conflict-Induced Slowing.额顶眶额皮层-基底神经节回路在反应抑制和冲突诱发减速中的跨任务贡献。
Cereb Cortex. 2019 May 1;29(5):1969-1983. doi: 10.1093/cercor/bhy076.
8
Behavioral and Neural Evidence of the Rewarding Value of Exercise Behaviors: A Systematic Review.运动行为的奖赏价值的行为和神经学证据:系统综述。
Sports Med. 2018 Jun;48(6):1389-1404. doi: 10.1007/s40279-018-0898-0.
9
Two-photon calcium imaging of the medial prefrontal cortex and hippocampus without cortical invasion.无皮层入侵的内侧前额叶皮层和海马体的双光子钙成像。
Elife. 2017 Sep 25;6:e26839. doi: 10.7554/eLife.26839.
10
Reminders of past choices bias decisions for reward in humans.过去选择的提示会影响人类对奖励的决策。
Nat Commun. 2017 Jun 27;8:15958. doi: 10.1038/ncomms15958.