序列阻断效应：测试时变差分学习神经机制的试验台。

The serial blocking effect: a testbed for the neural mechanisms of temporal-difference learning.

机构信息

Department of Psychology, Center for Studies in Behavioral Neurobiology/Groupe de recherche en neurobiologie comportementale, Concordia University, Montreal, Quebec, Canada.

Department of Psychology, Brooklyn College of the City University of New York, Brooklyn, NY, USA.

出版信息

Sci Rep. 2019 Apr 12;9(1):5962. doi: 10.1038/s41598-019-42244-4.

DOI:10.1038/s41598-019-42244-4

PMID:30979910

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6461709/

Abstract

Temporal-difference (TD) learning models afford the neuroscientist a theory-driven roadmap in the quest for the neural mechanisms of reinforcement learning. The application of these models to understanding the role of phasic midbrain dopaminergic responses in reward prediction learning constitutes one of the greatest success stories in behavioural and cognitive neuroscience. Critically, the classic learning paradigms associated with TD are poorly suited to cast light on its neural implementation, thus hampering progress. Here, we present a serial blocking paradigm in rodents that overcomes these limitations and allows for the simultaneous investigation of two cardinal TD tenets; namely, that learning depends on the computation of a prediction error, and that reinforcing value, whether intrinsic or acquired, propagates back to the onset of the earliest reliable predictor. The implications of this paradigm for the neural exploration of TD mechanisms are highlighted.

摘要

时频差（TD）学习模型为神经科学家提供了一条理论驱动的路线，以探索强化学习的神经机制。将这些模型应用于理解中脑多巴胺能反应的相位在奖励预测学习中的作用，是行为和认知神经科学中最成功的案例之一。关键的是，与 TD 相关的经典学习范式不太适合揭示其神经实现，从而阻碍了进展。在这里，我们在啮齿动物中提出了一个序列阻断范式，克服了这些限制，并允许同时研究 TD 的两个主要原则；即学习取决于预测误差的计算，以及强化价值，无论是内在的还是获得的，都会传播到最早可靠预测器的开始。该范式对 TD 机制的神经探索的影响被强调了。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/455e/6461709/00a184d06e16/41598_2019_42244_Fig1_HTML.jpg

相似文献

The serial blocking effect: a testbed for the neural mechanisms of temporal-difference learning.

Sci Rep. 2019 Apr 12;9(1):5962. doi: 10.1038/s41598-019-42244-4.

An imperfect dopaminergic error signal can drive temporal-difference learning.

PLoS Comput Biol. 2011 May;7(5):e1001133. doi: 10.1371/journal.pcbi.1001133. Epub 2011 May 12.

Abnormal temporal difference reward-learning signals in major depression.

Brain. 2008 Aug;131(Pt 8):2084-93. doi: 10.1093/brain/awn136. Epub 2008 Jun 25.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

NMDA receptor antagonism disrupts acquisition and retention of the context preexposure facilitation effect in adolescent rats.

Behav Brain Res. 2016 Mar 15;301:168-77. doi: 10.1016/j.bbr.2015.12.025. Epub 2015 Dec 19.

Anticipatory reward signals in ventral striatal neurons of behaving rats.

Eur J Neurosci. 2008 Nov;28(9):1849-66. doi: 10.1111/j.1460-9568.2008.06480.x.

Differential involvement of the medial prefrontal cortex across variants of contextual fear conditioning.

Learn Mem. 2017 Jul 17;24(8):322-330. doi: 10.1101/lm.045286.117. Print 2017 Aug.

The effect of chronic corticosterone on fear learning and memory depends on dose and the testing protocol.

Neuroscience. 2015 Mar 19;289:324-33. doi: 10.1016/j.neuroscience.2015.01.011. Epub 2015 Jan 14.

A Dual Role Hypothesis of the Cortico-Basal-Ganglia Pathways: Opponency and Temporal Difference Through Dopamine and Adenosine.

Front Neural Circuits. 2019 Jan 7;12:111. doi: 10.3389/fncir.2018.00111. eCollection 2018.

Modulatory effect of 17-β estradiol on performance of ovariectomized rats on the Shock-Probe test.

Physiol Behav. 2014 May 28;131:129-35. doi: 10.1016/j.physbeh.2014.04.030. Epub 2014 Apr 24.

引用本文的文献

Understanding Associative Learning Through Higher-Order Conditioning.

Front Behav Neurosci. 2022 Apr 18;16:845616. doi: 10.3389/fnbeh.2022.845616. eCollection 2022.

Neural substrates of appetitive and aversive prediction error.

Neurosci Biobehav Rev. 2021 Apr;123:337-351. doi: 10.1016/j.neubiorev.2020.10.029. Epub 2021 Jan 13.

Reward foraging task and model-based analysis reveal how fruit flies learn value of available options.

PLoS One. 2020 Oct 2;15(10):e0239616. doi: 10.1371/journal.pone.0239616. eCollection 2020.

Different methods of fear reduction are supported by distinct cortical substrates.

Elife. 2020 Jun 26;9:e55294. doi: 10.7554/eLife.55294.

Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors.

Nat Neurosci. 2020 Feb;23(2):176-178. doi: 10.1038/s41593-019-0574-1. Epub 2020 Jan 20.

本文引用的文献

Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors.

Nat Neurosci. 2020 Feb;23(2):176-178. doi: 10.1038/s41593-019-0574-1. Epub 2020 Jan 20.

Dopamine transients are sufficient and necessary for acquisition of model-based associations.

Nat Neurosci. 2017 May;20(5):735-742. doi: 10.1038/nn.4538. Epub 2017 Apr 3.

Orbitofrontal neurons acquire responses to 'valueless' Pavlovian cues during unblocking.

Elife. 2014 Jul 18;3:e02653. doi: 10.7554/eLife.02653.

Engineering a memory with LTD and LTP.

Nature. 2014 Jul 17;511(7509):348-52. doi: 10.1038/nature13294. Epub 2014 Jun 1.

A causal link between prediction errors, dopamine neurons and learning.

Nat Neurosci. 2013 Jul;16(7):966-73. doi: 10.1038/nn.3413. Epub 2013 May 26.

Nat Neurosci. 2011 Oct 30;14(12):1590-7. doi: 10.1038/nn.2957.

Dissociable roles of prelimbic and infralimbic cortices, ventral hippocampus, and basolateral amygdala in the expression and extinction of conditioned fear.

Neuropsychopharmacology. 2011 Jan;36(2):529-38. doi: 10.1038/npp.2010.184. Epub 2010 Oct 20.

The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes.

Neuron. 2009 Apr 30;62(2):269-80. doi: 10.1016/j.neuron.2009.03.005.

Dopamine D1 versus D4 receptors differentially modulate the encoding of salient versus nonsalient emotional information in the medial prefrontal cortex.

J Neurosci. 2009 Apr 15;29(15):4836-45. doi: 10.1523/JNEUROSCI.0178-09.2009.

CS-US temporal relations in blocking.

Learn Behav. 2008 May;36(2):92-103. doi: 10.3758/lb.36.2.92.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

序列阻断效应：测试时变差分学习神经机制的试验台。

The serial blocking effect: a testbed for the neural mechanisms of temporal-difference learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献