University College London, 17 Queen Square, London, WC1N 3AR, United Kingdom.
J Cogn Neurosci. 2011 Dec;23(12):3933-8. doi: 10.1162/jocn_a_00090. Epub 2011 Jul 7.
Two fundamental questions underlie the expression of behavior, namely what to do and how vigorously to do it. The former is the topic of an overwhelming wealth of theoretical and empirical work particularly in the fields of reinforcement learning and decision-making, with various forms of affective prediction error playing key roles. Although vigor concerns motivation, and so is the subject of many empirical studies in diverse fields, it has suffered a dearth of computational models. Recently, Niv et al. [Niv, Y., Daw, N. D., Joel, D., & Dayan, P. Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology (Berlin), 191, 507-520, 2007] suggested that vigor should be controlled by the opportunity cost of time, which is itself determined by the average rate of reward. This coupling of reward rate and vigor can be shown to be optimal under the theory of average return reinforcement learning for a particular class of tasks but may also be a more general, perhaps hard-wired, characteristic of the architecture of control. We, therefore, tested the hypothesis that healthy human participants would adjust their RTs on the basis of the average rate of reward. We measured RTs in an odd-ball discrimination task for rewards whose magnitudes varied slowly but systematically. Linear regression on the subjects' individual RTs using the time varying average rate of reward as the regressor of interest, and including nuisance regressors such as the immediate reward in a round and in the preceding round, showed that a significant fraction of the variance in subjects' RTs could indeed be explained by the rate of experienced reward. This validates one of the key proposals associated with the model, illuminating an apparently mandatory form of coupling that may involve tonic levels of dopamine.
两个基本问题是行为表达的基础,即做什么和如何有力地做。前者是大量理论和经验工作的主题,特别是在强化学习和决策领域,各种形式的情感预测误差起着关键作用。尽管活力与动机有关,因此是许多不同领域的实证研究的主题,但它缺乏计算模型。最近,Niv 等人[Niv,Y.,Daw,N.D.,Joel,D.,Dayan,P. Tonic 多巴胺:机会成本和反应活力的控制。精神药理学(柏林),191,507-520,2007]认为,活力应该由时间的机会成本来控制,而时间的机会成本本身又是由平均奖励率决定的。这种奖励率和活力的耦合可以在平均回报强化学习理论下显示为一类特定任务的最优,但也可能是控制架构的更一般的、也许是硬性的特征。因此,我们测试了一个假设,即健康的人类参与者会根据奖励的平均率来调整他们的反应时间。我们在一个奇数球辨别任务中测量了奖励的反应时间,奖励的大小缓慢但系统地变化。使用时间变化的平均奖励率作为感兴趣的回归量,对受试者的个体反应时间进行线性回归,并包括即时奖励在一轮和前一轮的干扰回归量,表明受试者反应时间的很大一部分方差确实可以用经历的奖励率来解释。这验证了与该模型相关的关键假设之一,阐明了一种明显强制性的耦合形式,可能涉及多巴胺的紧张水平。