一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Institute of Physiology and Program in Neuroscience, University of Fribourg, Switzerland.

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

This study investigated how the simulated response of dopamine neurons to reward-related stimuli could be used as reinforcement signal for learning a spatial delayed response task. Spatial delayed response tasks assess the functions of frontal cortex and basal ganglia in short-term memory, movement preparation and expectation of environmental events. In these tasks, a stimulus appears for a short period at a particular location, and after a delay the subject moves to the location indicated. Dopamine neurons are activated by unpredicted rewards and reward-predicting stimuli, are not influenced by fully predicted rewards, and are depressed by omitted rewards. Thus, they appear to report an error in the prediction of reward, which is the crucial reinforcement term in formal learning theories. Theoretical studies on reinforcement learning have shown that signals similar to dopamine responses can be used as effective teaching signals for learning. A neural network model implementing the temporal difference algorithm was trained to perform a simulated spatial delayed response task. The reinforcement signal was modeled according to the basic characteristics of dopamine responses to novel stimuli, primary rewards and reward-predicting stimuli. A Critic component analogous to dopamine neurons computed a temporal error in the prediction of reinforcement and emitted this signal to an Actor component which mediated the behavioral output. The spatial delayed response task was learned via two subtasks introducing spatial choices and temporal delays, in the same manner as monkeys in the laboratory. In all three tasks, the reinforcement signal of the Critic developed in a similar manner to the responses of natural dopamine neurons in comparable learning situations, and the learning curves of the Actor replicated the progress of learning observed in the animals. Several manipulations demonstrated further the efficacy of the particular characteristics of the dopamine-like reinforcement signal. Omission of reward induced a phasic reduction of the reinforcement signal at the time of the reward and led to extinction of learned actions. A reinforcement signal without prediction error resulted in impaired learning because of perseverative errors. Loss of learned behavior was seen with sustained reductions of the reinforcement signal, a situation in general comparable to the loss of dopamine innervation in Parkinsonian patients and experimentally lesioned animals. The striking similarities in teaching signals and learning behavior between the computational and biological results suggest that dopamine-like reward responses may serve as effective teaching signals for learning behavioral tasks that are typical for primate cognitive behavior, such as spatial delayed responding.

本研究调查了多巴胺神经元对奖励相关刺激的模拟反应如何用作学习空间延迟反应任务的强化信号。空间延迟反应任务评估额叶皮质和基底神经节在短期记忆、运动准备和对环境事件的预期中的功能。在这些任务中，刺激在特定位置短暂出现，经过延迟后，受试者移动到指示的位置。多巴胺神经元会因意外奖励和奖励预测刺激而被激活，不受完全可预测奖励的影响，并会因奖励缺失而受到抑制。因此，它们似乎报告了奖励预测中的误差，这是形式学习理论中的关键强化项。强化学习的理论研究表明，与多巴胺反应相似的信号可作为学习的有效教学信号。一个实现时间差分算法的神经网络模型经过训练以执行模拟的空间延迟反应任务。强化信号根据多巴胺对新刺激、初级奖励和奖励预测刺激的反应的基本特征进行建模。一个类似于多巴胺神经元的“评判”组件计算强化预测中的时间误差，并将此信号发送到一个“行动者”组件，该组件介导行为输出。空间延迟反应任务通过引入空间选择和时间延迟的两个子任务来学习，方式与实验室中的猴子相同。在所有三个任务中，“评判”的强化信号以与可比学习情境中天然多巴胺神经元反应相似的方式发展，并且“行动者”的学习曲线复制了在动物中观察到的学习进展。几种操作进一步证明了类多巴胺强化信号特定特征的有效性。奖励缺失会在奖励时引起强化信号的阶段性降低，并导致习得行为的消退。没有预测误差的强化信号会因持续性错误而导致学习受损。当强化信号持续降低时会出现习得行为的丧失，这种情况总体上与帕金森病患者和实验性损伤动物中多巴胺神经支配的丧失相当。计算结果和生物学结果在教学信号和学习行为上的惊人相似表明，类多巴胺奖励反应可能作为学习灵长类认知行为典型的行为任务（如空间延迟反应）的有效教学信号。

相似文献

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Predictive reward signal of dopamine neurons.

J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.

Learning of sequential movements by neural network model with dopamine-like reinforcement signal.

Exp Brain Res. 1998 Aug;121(3):350-4. doi: 10.1007/s002210050467.

Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task.

J Neurosci. 1993 Mar;13(3):900-13. doi: 10.1523/JNEUROSCI.13-03-00900.1993.

Reward-dependent learning in neuronal networks for planning and decision making.

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

Modeling functions of striatal dopamine modulation in learning and planning.

Neuroscience. 2001;103(1):65-85. doi: 10.1016/s0306-4522(00)00554-6.

Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.

Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11.

Anticipatory reward signals in ventral striatal neurons of behaving rats.

Eur J Neurosci. 2008 Nov;28(9):1849-66. doi: 10.1111/j.1460-9568.2008.06480.x.

Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies.

J Neurosci. 2017 Aug 30;37(35):8315-8329. doi: 10.1523/JNEUROSCI.1221-17.2017. Epub 2017 Jul 24.

引用本文的文献

Fluorescence detection of dopamine signaling to the primate striatum in relation to stimulus-reward associations.

Proc Natl Acad Sci U S A. 2025 Mar 18;122(11):e2426861122. doi: 10.1073/pnas.2426861122. Epub 2025 Mar 13.

"Actor-critic" dichotomous hyperactivation and hypoconnectivity in obsessive-compulsive disorder.

Neuroimage Clin. 2025;45:103729. doi: 10.1016/j.nicl.2024.103729. Epub 2024 Dec 31.

Astrocyte D1/D5 Dopamine Receptors Govern Non-Hebbian Long-Term Potentiation at Sensory Synapses onto Lamina I Spinoparabrachial Neurons.

J Neurosci. 2024 Aug 7;44(32):e0170242024. doi: 10.1523/JNEUROSCI.0170-24.2024.

Songbird mesostriatal dopamine pathways are spatially segregated before the onset of vocal learning.

PLoS One. 2023 Nov 16;18(11):e0285652. doi: 10.1371/journal.pone.0285652. eCollection 2023.

Food reinforcement architecture: A framework for impulsive and compulsive overeating and food abuse.

Obesity (Silver Spring). 2023 Jul;31(7):1734-1744. doi: 10.1002/oby.23792.

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.

Thalamocortical contribution to flexible learning in neural systems.

Netw Neurosci. 2022 Oct 1;6(4):980-997. doi: 10.1162/netn_a_00235. eCollection 2022.

Efficient coding of cognitive variables underlies dopamine response and choice behavior.

Nat Neurosci. 2022 Jun;25(6):738-748. doi: 10.1038/s41593-022-01085-7. Epub 2022 Jun 6.

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.

Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.

Spatial preferences account for inter-animal variability during the continual learning of a dynamic cognitive task.

Cell Rep. 2022 Apr 19;39(3):110708. doi: 10.1016/j.celrep.2022.110708.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Predictive reward signal of dopamine neurons.

J Neurophysiol. 1998 Jul;80(1):1-27. doi: 10.1152/jn.1998.80.1.1.

Learning of sequential movements by neural network model with dopamine-like reinforcement signal.

Exp Brain Res. 1998 Aug;121(3):350-4. doi: 10.1007/s002210050467.

Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task.

J Neurosci. 1993 Mar;13(3):900-13. doi: 10.1523/JNEUROSCI.13-03-00900.1993.

Reward-dependent learning in neuronal networks for planning and decision making.

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

Modeling functions of striatal dopamine modulation in learning and planning.

Neuroscience. 2001;103(1):65-85. doi: 10.1016/s0306-4522(00)00554-6.

Involvement of basal ganglia and orbitofrontal cortex in goal-directed behavior.

Prog Brain Res. 2000;126:193-215. doi: 10.1016/S0079-6123(00)26015-9.

Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11.

Anticipatory reward signals in ventral striatal neurons of behaving rats.

Eur J Neurosci. 2008 Nov;28(9):1849-66. doi: 10.1111/j.1460-9568.2008.06480.x.

Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies.

J Neurosci. 2017 Aug 30;37(35):8315-8329. doi: 10.1523/JNEUROSCI.1221-17.2017. Epub 2017 Jul 24.

引用本文的文献

Fluorescence detection of dopamine signaling to the primate striatum in relation to stimulus-reward associations.

Proc Natl Acad Sci U S A. 2025 Mar 18;122(11):e2426861122. doi: 10.1073/pnas.2426861122. Epub 2025 Mar 13.

"Actor-critic" dichotomous hyperactivation and hypoconnectivity in obsessive-compulsive disorder.

Neuroimage Clin. 2025;45:103729. doi: 10.1016/j.nicl.2024.103729. Epub 2024 Dec 31.

Astrocyte D1/D5 Dopamine Receptors Govern Non-Hebbian Long-Term Potentiation at Sensory Synapses onto Lamina I Spinoparabrachial Neurons.

J Neurosci. 2024 Aug 7;44(32):e0170242024. doi: 10.1523/JNEUROSCI.0170-24.2024.

Songbird mesostriatal dopamine pathways are spatially segregated before the onset of vocal learning.

PLoS One. 2023 Nov 16;18(11):e0285652. doi: 10.1371/journal.pone.0285652. eCollection 2023.

Food reinforcement architecture: A framework for impulsive and compulsive overeating and food abuse.

Obesity (Silver Spring). 2023 Jul;31(7):1734-1744. doi: 10.1002/oby.23792.

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model.

Nat Neurosci. 2023 May;26(5):830-839. doi: 10.1038/s41593-023-01310-x. Epub 2023 Apr 20.

Thalamocortical contribution to flexible learning in neural systems.

Netw Neurosci. 2022 Oct 1;6(4):980-997. doi: 10.1162/netn_a_00235. eCollection 2022.

Efficient coding of cognitive variables underlies dopamine response and choice behavior.

Nat Neurosci. 2022 Jun;25(6):738-748. doi: 10.1038/s41593-022-01085-7. Epub 2022 Jun 6.

Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.

Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.

Spatial preferences account for inter-animal variability during the continual learning of a dynamic cognitive task.

Cell Rep. 2022 Apr 19;39(3):110708. doi: 10.1016/j.celrep.2022.110708.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献