价值的不当行为与意志的自律。

The misbehavior of value and the discipline of the will.

作者信息

Dayan Peter, Niv Yael, Seymour Ben, Daw Nathaniel D

机构信息

Gatsby Computational Neuroscience Unit, UCL, 17 Queen Square, London, UK.

出版信息

Neural Netw. 2006 Oct;19(8):1153-60. doi: 10.1016/j.neunet.2006.03.002. Epub 2006 Aug 30.

DOI:10.1016/j.neunet.2006.03.002

PMID:16938432

Abstract

Most reinforcement learning models of animal conditioning operate under the convenient, though fictive, assumption that Pavlovian conditioning concerns prediction learning whereas instrumental conditioning concerns action learning. However, it is only through Pavlovian responses that Pavlovian prediction learning is evident, and these responses can act against the instrumental interests of the subjects. This can be seen in both experimental and natural circumstances. In this paper we study the consequences of importing this competition into a reinforcement learning context, and demonstrate the resulting effects in an omission schedule and a maze navigation task. The misbehavior created by Pavlovian values can be quite debilitating; we discuss how it may be disciplined.

摘要

大多数动物条件作用的强化学习模型都是在一个方便但虚构的假设下运行的，即经典条件作用涉及预测学习，而工具性条件作用涉及行动学习。然而，只有通过经典条件反应，经典预测学习才会显现出来，而这些反应可能会违背主体的工具性利益。这在实验和自然环境中都可以看到。在本文中，我们研究了将这种竞争引入强化学习环境的后果，并在遗漏任务和迷宫导航任务中展示了由此产生的效果。经典价值所产生的不当行为可能相当有害；我们讨论了如何对其进行约束。

相似文献

The misbehavior of value and the discipline of the will.

Neural Netw. 2006 Oct;19(8):1153-60. doi: 10.1016/j.neunet.2006.03.002. Epub 2006 Aug 30.

Magazine approach during a signal for food depends on Pavlovian, not instrumental, conditioning.

J Exp Psychol Anim Behav Process. 2013 Apr;39(2):107-16. doi: 10.1037/a0031315. Epub 2013 Feb 18.

Within-subject effects of number of trials in rat conditioning procedures.

J Exp Psychol Anim Behav Process. 2010 Apr;36(2):217-31. doi: 10.1037/a0016425.

Feeding behavior of Aplysia: a model system for comparing cellular mechanisms of classical and operant conditioning.

Learn Mem. 2006 Nov-Dec;13(6):669-80. doi: 10.1101/lm.339206.

Competition between an avoidance response and a safety signal: evidence for a single learning system.

Biol Psychol. 2013 Jan;92(1):9-16. doi: 10.1016/j.biopsycho.2011.09.007. Epub 2011 Sep 29.

[Activity of neurons in the pedunculopontine nucleus in conditioned instrumental appetitive reflex].

Zh Vyssh Nerv Deiat Im I P Pavlova. 2002 Nov-Dec;52(6):705-15.

Asymmetrical interactions between thirst and hunger in Pavlovian-instrumental transfer.

Q J Exp Psychol B. 1994 May;47(2):211-31.

Summation of reinforcement rates when conditioned stimuli are presented in compound.

J Exp Psychol Anim Behav Process. 2011 Oct;37(4):385-93. doi: 10.1037/a0024553.

Learning not to respond: Role of the hippocampus in withholding responses during omission training.

Behav Brain Res. 2017 Feb 1;318:61-70. doi: 10.1016/j.bbr.2016.11.011. Epub 2016 Nov 9.

Pavlovian to instrumental transfer: a neurobehavioural perspective.

Neurosci Biobehav Rev. 2010 Jul;34(8):1277-95. doi: 10.1016/j.neubiorev.2010.03.007. Epub 2010 Apr 10.

引用本文的文献

Decrease in decision noise from adolescence into adulthood mediates an increase in more sophisticated choice behaviors and performance gain.

PLoS Biol. 2024 Nov 14;22(11):e3002877. doi: 10.1371/journal.pbio.3002877. eCollection 2024 Nov.

Pavlovian impatience: The anticipation of immediate rewards increases approach behaviour.

Cogn Affect Behav Neurosci. 2025 Apr;25(2):358-376. doi: 10.3758/s13415-024-01236-2. Epub 2024 Oct 28.

High stakes slow responding, but do not help overcome Pavlovian biases in humans.

Learn Mem. 2024 Sep 16;31(8). doi: 10.1101/lm.054017.124. Print 2024 Aug.

Topographically selective motor inhibition under threat of pain.

Pain. 2024 Dec 1;165(12):2851-2862. doi: 10.1097/j.pain.0000000000003301. Epub 2024 Jun 25.

Pupil dilation reflects effortful action invigoration in overcoming aversive Pavlovian biases.

Cogn Affect Behav Neurosci. 2024 Aug;24(4):720-739. doi: 10.3758/s13415-024-01191-y. Epub 2024 May 21.

Leveraging individual differences in cue-reward learning to investigate the psychological and neural basis of shared psychiatric symptomatology: The sign-tracker/goal-tracker model.

Behav Neurosci. 2024 Aug;138(4):260-271. doi: 10.1037/bne0000590. Epub 2024 May 16.

Decisional brain of lawyers at the workplace. A neurolaw pilot study.

Cogn Neurodyn. 2024 Apr;18(2):461-471. doi: 10.1007/s11571-023-10020-w. Epub 2023 Nov 10.

Focused stimulation of dorsal versus ventral subthalamic nucleus enhances action-outcome learning in patients with Parkinson's disease.

Brain Commun. 2024 Apr 2;6(2):fcae111. doi: 10.1093/braincomms/fcae111. eCollection 2024.

Multiple and subject-specific roles of uncertainty in reward-guided decision-making.

bioRxiv. 2024 Sep 12:2024.03.27.587016. doi: 10.1101/2024.03.27.587016.

Craving money? Evidence from the laboratory and the field.

Sci Adv. 2024 Jan 12;10(2):eadi5034. doi: 10.1126/sciadv.adi5034.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

价值的不当行为与意志的自律。

The misbehavior of value and the discipline of the will.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献