Suppr超能文献

一种具有类似多巴胺强化信号的神经网络模型,用于学习空间延迟反应任务。

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.

作者信息

Suri R E, Schultz W

机构信息

Institute of Physiology and Program in Neuroscience, University of Fribourg, Switzerland.

出版信息

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Abstract

This study investigated how the simulated response of dopamine neurons to reward-related stimuli could be used as reinforcement signal for learning a spatial delayed response task. Spatial delayed response tasks assess the functions of frontal cortex and basal ganglia in short-term memory, movement preparation and expectation of environmental events. In these tasks, a stimulus appears for a short period at a particular location, and after a delay the subject moves to the location indicated. Dopamine neurons are activated by unpredicted rewards and reward-predicting stimuli, are not influenced by fully predicted rewards, and are depressed by omitted rewards. Thus, they appear to report an error in the prediction of reward, which is the crucial reinforcement term in formal learning theories. Theoretical studies on reinforcement learning have shown that signals similar to dopamine responses can be used as effective teaching signals for learning. A neural network model implementing the temporal difference algorithm was trained to perform a simulated spatial delayed response task. The reinforcement signal was modeled according to the basic characteristics of dopamine responses to novel stimuli, primary rewards and reward-predicting stimuli. A Critic component analogous to dopamine neurons computed a temporal error in the prediction of reinforcement and emitted this signal to an Actor component which mediated the behavioral output. The spatial delayed response task was learned via two subtasks introducing spatial choices and temporal delays, in the same manner as monkeys in the laboratory. In all three tasks, the reinforcement signal of the Critic developed in a similar manner to the responses of natural dopamine neurons in comparable learning situations, and the learning curves of the Actor replicated the progress of learning observed in the animals. Several manipulations demonstrated further the efficacy of the particular characteristics of the dopamine-like reinforcement signal. Omission of reward induced a phasic reduction of the reinforcement signal at the time of the reward and led to extinction of learned actions. A reinforcement signal without prediction error resulted in impaired learning because of perseverative errors. Loss of learned behavior was seen with sustained reductions of the reinforcement signal, a situation in general comparable to the loss of dopamine innervation in Parkinsonian patients and experimentally lesioned animals. The striking similarities in teaching signals and learning behavior between the computational and biological results suggest that dopamine-like reward responses may serve as effective teaching signals for learning behavioral tasks that are typical for primate cognitive behavior, such as spatial delayed responding.

摘要

本研究调查了多巴胺神经元对奖励相关刺激的模拟反应如何用作学习空间延迟反应任务的强化信号。空间延迟反应任务评估额叶皮质和基底神经节在短期记忆、运动准备和对环境事件的预期中的功能。在这些任务中,刺激在特定位置短暂出现,经过延迟后,受试者移动到指示的位置。多巴胺神经元会因意外奖励和奖励预测刺激而被激活,不受完全可预测奖励的影响,并会因奖励缺失而受到抑制。因此,它们似乎报告了奖励预测中的误差,这是形式学习理论中的关键强化项。强化学习的理论研究表明,与多巴胺反应相似的信号可作为学习的有效教学信号。一个实现时间差分算法的神经网络模型经过训练以执行模拟的空间延迟反应任务。强化信号根据多巴胺对新刺激、初级奖励和奖励预测刺激的反应的基本特征进行建模。一个类似于多巴胺神经元的“评判”组件计算强化预测中的时间误差,并将此信号发送到一个“行动者”组件,该组件介导行为输出。空间延迟反应任务通过引入空间选择和时间延迟的两个子任务来学习,方式与实验室中的猴子相同。在所有三个任务中,“评判”的强化信号以与可比学习情境中天然多巴胺神经元反应相似的方式发展,并且“行动者”的学习曲线复制了在动物中观察到的学习进展。几种操作进一步证明了类多巴胺强化信号特定特征的有效性。奖励缺失会在奖励时引起强化信号的阶段性降低,并导致习得行为的消退。没有预测误差的强化信号会因持续性错误而导致学习受损。当强化信号持续降低时会出现习得行为的丧失,这种情况总体上与帕金森病患者和实验性损伤动物中多巴胺神经支配的丧失相当。计算结果和生物学结果在教学信号和学习行为上的惊人相似表明,类多巴胺奖励反应可能作为学习灵长类认知行为典型的行为任务(如空间延迟反应)的有效教学信号。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验