Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford OX1 3UD, UK.
Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, New York 14627, USA.
Nat Commun. 2016 Aug 1;7:12327. doi: 10.1038/ncomms12327.
In many natural environments the value of a choice gradually gets better or worse as circumstances change. Discerning such trends makes predicting future choice values possible. We show that humans track such trends by comparing estimates of recent and past reward rates, which they are able to hold simultaneously in the dorsal anterior cingulate cortex (dACC). Comparison of recent and past reward rates with positive and negative decision weights is reflected by opposing dACC signals indexing these quantities. The relative strengths of time-linked reward representations in dACC predict whether subjects persist in their current behaviour or switch to an alternative. Computationally, trend-guided choice can be modelled by using a reinforcement-learning mechanism that computes a longer-term estimate (or expectation) of prediction errors. Using such a model, we find a relative predominance of expected prediction errors in dACC, instantaneous prediction errors in the ventral striatum and choice signals in the ventromedial prefrontal cortex.
在许多自然环境中,随着环境的变化,选择的价值逐渐变好或变坏。辨别这种趋势使得预测未来的选择价值成为可能。我们通过比较对近期和过去奖励率的估计来证明人类能够追踪到这种趋势,这些估计可以同时在背侧前扣带皮层(dACC)中保持。对近期和过去奖励率的比较与正、负决策权重的比较由反映这些数量的相反的 dACC 信号来表示。dACC 中与时间相关的奖励表示的相对强度预测了主体是坚持当前行为还是切换到另一种行为。在计算上,通过使用计算预测误差的长期估计(或期望)的强化学习机制,可以对趋势引导的选择进行建模。使用这样的模型,我们发现 dACC 中存在相对占主导地位的预期预测误差、腹侧纹状体中的即时预测误差以及腹内侧前额叶皮层中的选择信号。