• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

价值的不当行为与意志的自律。

The misbehavior of value and the discipline of the will.

作者信息

Dayan Peter, Niv Yael, Seymour Ben, Daw Nathaniel D

机构信息

Gatsby Computational Neuroscience Unit, UCL, 17 Queen Square, London, UK.

出版信息

Neural Netw. 2006 Oct;19(8):1153-60. doi: 10.1016/j.neunet.2006.03.002. Epub 2006 Aug 30.

DOI:10.1016/j.neunet.2006.03.002
PMID:16938432
Abstract

Most reinforcement learning models of animal conditioning operate under the convenient, though fictive, assumption that Pavlovian conditioning concerns prediction learning whereas instrumental conditioning concerns action learning. However, it is only through Pavlovian responses that Pavlovian prediction learning is evident, and these responses can act against the instrumental interests of the subjects. This can be seen in both experimental and natural circumstances. In this paper we study the consequences of importing this competition into a reinforcement learning context, and demonstrate the resulting effects in an omission schedule and a maze navigation task. The misbehavior created by Pavlovian values can be quite debilitating; we discuss how it may be disciplined.

摘要

大多数动物条件作用的强化学习模型都是在一个方便但虚构的假设下运行的,即经典条件作用涉及预测学习,而工具性条件作用涉及行动学习。然而,只有通过经典条件反应,经典预测学习才会显现出来,而这些反应可能会违背主体的工具性利益。这在实验和自然环境中都可以看到。在本文中,我们研究了将这种竞争引入强化学习环境的后果,并在遗漏任务和迷宫导航任务中展示了由此产生的效果。经典价值所产生的不当行为可能相当有害;我们讨论了如何对其进行约束。

相似文献

1
The misbehavior of value and the discipline of the will.价值的不当行为与意志的自律。
Neural Netw. 2006 Oct;19(8):1153-60. doi: 10.1016/j.neunet.2006.03.002. Epub 2006 Aug 30.
2
Magazine approach during a signal for food depends on Pavlovian, not instrumental, conditioning.在食物信号出现时的趋近行为取决于巴甫洛夫式条件作用,而非工具性条件作用。
J Exp Psychol Anim Behav Process. 2013 Apr;39(2):107-16. doi: 10.1037/a0031315. Epub 2013 Feb 18.
3
Within-subject effects of number of trials in rat conditioning procedures.大鼠条件反射程序中试验次数的受试者内效应。
J Exp Psychol Anim Behav Process. 2010 Apr;36(2):217-31. doi: 10.1037/a0016425.
4
Feeding behavior of Aplysia: a model system for comparing cellular mechanisms of classical and operant conditioning.海兔的进食行为:一个用于比较经典条件作用和操作性条件作用细胞机制的模型系统。
Learn Mem. 2006 Nov-Dec;13(6):669-80. doi: 10.1101/lm.339206.
5
Competition between an avoidance response and a safety signal: evidence for a single learning system.回避反应与安全信号之间的竞争:单一学习系统的证据。
Biol Psychol. 2013 Jan;92(1):9-16. doi: 10.1016/j.biopsycho.2011.09.007. Epub 2011 Sep 29.
6
[Activity of neurons in the pedunculopontine nucleus in conditioned instrumental appetitive reflex].[条件性工具性食欲反射中脑桥脚被盖核神经元的活动]
Zh Vyssh Nerv Deiat Im I P Pavlova. 2002 Nov-Dec;52(6):705-15.
7
Asymmetrical interactions between thirst and hunger in Pavlovian-instrumental transfer.经典条件反射-工具性转移中口渴与饥饿之间的不对称相互作用。
Q J Exp Psychol B. 1994 May;47(2):211-31.
8
Summation of reinforcement rates when conditioned stimuli are presented in compound.当复合呈现条件刺激时强化率的总和。
J Exp Psychol Anim Behav Process. 2011 Oct;37(4):385-93. doi: 10.1037/a0024553.
9
Learning not to respond: Role of the hippocampus in withholding responses during omission training.学会不做出反应:海马体在延缓训练期间抑制反应中的作用。
Behav Brain Res. 2017 Feb 1;318:61-70. doi: 10.1016/j.bbr.2016.11.011. Epub 2016 Nov 9.
10
Pavlovian to instrumental transfer: a neurobehavioural perspective.巴甫洛夫式到工具性转移:一种神经行为学视角。
Neurosci Biobehav Rev. 2010 Jul;34(8):1277-95. doi: 10.1016/j.neubiorev.2010.03.007. Epub 2010 Apr 10.

引用本文的文献

1
Decrease in decision noise from adolescence into adulthood mediates an increase in more sophisticated choice behaviors and performance gain.青春期到成年期决策噪声的减少,介导了更复杂的选择行为和表现增益的增加。
PLoS Biol. 2024 Nov 14;22(11):e3002877. doi: 10.1371/journal.pbio.3002877. eCollection 2024 Nov.
2
Pavlovian impatience: The anticipation of immediate rewards increases approach behaviour.巴甫洛夫式的不耐烦:对即时奖励的期待会增加趋近行为。
Cogn Affect Behav Neurosci. 2025 Apr;25(2):358-376. doi: 10.3758/s13415-024-01236-2. Epub 2024 Oct 28.
3
High stakes slow responding, but do not help overcome Pavlovian biases in humans.
高风险反应慢,但无助于克服人类的巴甫洛夫偏见。
Learn Mem. 2024 Sep 16;31(8). doi: 10.1101/lm.054017.124. Print 2024 Aug.
4
Topographically selective motor inhibition under threat of pain.在疼痛威胁下的地形选择性运动抑制
Pain. 2024 Dec 1;165(12):2851-2862. doi: 10.1097/j.pain.0000000000003301. Epub 2024 Jun 25.
5
Pupil dilation reflects effortful action invigoration in overcoming aversive Pavlovian biases.瞳孔扩张反映了在克服厌恶的巴甫洛夫偏见时费力行动的激发。
Cogn Affect Behav Neurosci. 2024 Aug;24(4):720-739. doi: 10.3758/s13415-024-01191-y. Epub 2024 May 21.
6
Leveraging individual differences in cue-reward learning to investigate the psychological and neural basis of shared psychiatric symptomatology: The sign-tracker/goal-tracker model.利用线索-奖励学习中的个体差异来探究共享精神症状的心理和神经基础:信号追踪者/目标追踪者模型。
Behav Neurosci. 2024 Aug;138(4):260-271. doi: 10.1037/bne0000590. Epub 2024 May 16.
7
Decisional brain of lawyers at the workplace. A neurolaw pilot study.工作场所中律师的决策性大脑。一项神经法学试点研究。
Cogn Neurodyn. 2024 Apr;18(2):461-471. doi: 10.1007/s11571-023-10020-w. Epub 2023 Nov 10.
8
Focused stimulation of dorsal versus ventral subthalamic nucleus enhances action-outcome learning in patients with Parkinson's disease.对帕金森病患者丘脑底核背侧与腹侧进行聚焦刺激可增强动作-结果学习。
Brain Commun. 2024 Apr 2;6(2):fcae111. doi: 10.1093/braincomms/fcae111. eCollection 2024.
9
Multiple and subject-specific roles of uncertainty in reward-guided decision-making.不确定性在奖励引导决策中的多种特定主体作用。
bioRxiv. 2024 Sep 12:2024.03.27.587016. doi: 10.1101/2024.03.27.587016.
10
Craving money? Evidence from the laboratory and the field.渴望金钱?来自实验室和实地的证据。
Sci Adv. 2024 Jan 12;10(2):eadi5034. doi: 10.1126/sciadv.adi5034.