Niv Yael, Daw Nathaniel D, Joel Daphna, Dayan Peter
Interdisciplinary Center for Neural Computation, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel.
Psychopharmacology (Berl). 2007 Apr;191(3):507-20. doi: 10.1007/s00213-006-0502-4. Epub 2006 Oct 10.
Dopamine neurotransmission has long been known to exert a powerful influence over the vigor, strength, or rate of responding. However, there exists no clear understanding of the computational foundation for this effect; predominant accounts of dopamine's computational function focus on a role for phasic dopamine in controlling the discrete selection between different actions and have nothing to say about response vigor or indeed the free-operant tasks in which it is typically measured.
We seek to accommodate free-operant behavioral tasks within the realm of models of optimal control and thereby capture how dopaminergic and motivational manipulations affect response vigor.
We construct an average reward reinforcement learning model in which subjects choose both which action to perform and also the latency with which to perform it. Optimal control balances the costs of acting quickly against the benefits of getting reward earlier and thereby chooses a best response latency.
In this framework, the long-run average rate of reward plays a key role as an opportunity cost and mediates motivational influences on rates and vigor of responding. We review evidence suggesting that the average reward rate is reported by tonic levels of dopamine putatively in the nucleus accumbens.
Our extension of reinforcement learning models to free-operant tasks unites psychologically and computationally inspired ideas about the role of tonic dopamine in striatum, explaining from a normative point of view why higher levels of dopamine might be associated with more vigorous responding.
长期以来,人们都知道多巴胺神经传递对反应的活力、强度或速率有着强大的影响。然而,对于这种效应的计算基础尚无清晰的认识;关于多巴胺计算功能的主流观点聚焦于相位多巴胺在控制不同动作之间的离散选择中的作用,而对于反应活力或实际上测量反应活力的自由操作任务却未作任何阐述。
我们试图将自由操作行为任务纳入最优控制模型的范畴,从而捕捉多巴胺能和动机性操作如何影响反应活力。
我们构建了一个平均奖励强化学习模型,在该模型中,主体既要选择执行何种动作,也要选择执行该动作的延迟时间。最优控制在快速行动的成本与更早获得奖励的收益之间进行权衡,从而选择最佳反应延迟时间。
在此框架中,长期平均奖励率作为一种机会成本发挥着关键作用,并介导了对反应速率和活力的动机性影响。我们回顾了相关证据,这些证据表明平均奖励率由假定伏隔核中的多巴胺紧张性水平反映。
我们将强化学习模型扩展至自由操作任务,在心理和计算层面统一了关于紧张性多巴胺在纹状体中作用的观点,从规范的角度解释了为何多巴胺水平较高可能与更有力的反应相关。