Suppr超能文献

阶段性多巴胺信号增强了什么?

What is reinforced by phasic dopamine signals?

作者信息

Redgrave Peter, Gurney Kevin, Reynolds John

机构信息

Department of Psychology, University of Sheffield, Western Bank, Sheffield, S10 2TP, UK.

出版信息

Brain Res Rev. 2008 Aug;58(2):322-39. doi: 10.1016/j.brainresrev.2007.10.007. Epub 2007 Oct 26.

Abstract

The basal ganglia have been associated with processes of reinforcement learning. A strong line of supporting evidence comes from the recording of dopamine (DA) neurones in behaving monkeys. Unpredicted, biologically salient events, including rewards cause a stereotypic short-latency (70-100 ms), short-duration (100-200 ms) burst of DA activity - the phasic response. This response is widely considered to represent reward prediction errors used as teaching signals in appetitive learning to promote actions that will maximise future reward acquisition. For DA signalling to perform this function, sensory processing afferent to DA neurones should discriminate unpredicted reward-related events. However, the comparative response latencies of DA neurones and orienting gaze-shifts indicate that phasic DA responses are triggered by pre-attentive sensory processing. Consequently, in circumstances where biologically salient events are both spatially and temporally unpredictable, it is unlikely their identity will be known at the time of DA signalling. The limited quality of afferent sensory processing and the precise timing of phasic DA signals, suggests that they may play a less direct role in 'Law of Effect' appetitive learning. Rather, the 'time-stamp' nature of the phasic response, in conjunction with the other signals likely to be present in the basal ganglia at the time of phasic DA input, suggests it may reinforce the discovery of unpredicted sensory events for which the organism is responsible. Furthermore, DA-promoted repetition of preceding actions/movements should enable the system to converge on those aspects of context and behavioural output that lead to the discovery of novel actions.

摘要

基底神经节与强化学习过程相关。一条有力的支持证据来自对行为猴子中多巴胺(DA)神经元的记录。不可预测的、具有生物学显著性的事件,包括奖励,会引发一种刻板的短潜伏期(70 - 100毫秒)、短持续时间(100 - 200毫秒)的DA活动爆发——即相位反应。这种反应被广泛认为代表了奖励预测误差,在食欲性学习中用作教学信号,以促进能使未来奖励获取最大化的行为。为了使DA信号传导执行此功能,传入DA神经元的感觉处理应能区分不可预测的奖励相关事件。然而,DA神经元的比较反应潜伏期和定向注视转移表明,相位DA反应是由注意前的感觉处理触发的。因此,在生物学显著性事件在空间和时间上都不可预测的情况下,在DA信号传导时不太可能知道它们的特征。传入感觉处理的有限质量和相位DA信号的精确时间,表明它们在“效果律”食欲性学习中可能发挥较不直接的作用。相反,相位反应的“时间戳”性质,与相位DA输入时基底神经节中可能存在的其他信号相结合,表明它可能加强对生物体负责的不可预测感觉事件的发现。此外,DA促进的先前动作/运动的重复应能使系统收敛于导致发现新动作的情境和行为输出的那些方面。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验