Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, U.S.A.
Neural Comput. 2014 Mar;26(3):467-71. doi: 10.1162/NECO_a_00559. Epub 2013 Dec 9.
Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: a quadratic transformation of proximity to the goal implies approximately linear ramping, as observed experimentally.
多巴胺的时间差分学习模型断言,多巴胺的相位水平编码了奖励预测误差。然而,最近观察到的逐渐上升的纹状体多巴胺水平作为目标接近时,这一假设受到了挑战。本说明描述了时间差分学习模型预测多巴胺上升的条件。关键思想是表示性的:目标接近度的二次变换意味着实验中观察到的近似线性上升。