Department of Psychology, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94704, USA.
Department of Psychology, Princeton University, South Drive, Princeton, NJ 08540, USA.
Curr Biol. 2019 May 20;29(10):1606-1613.e5. doi: 10.1016/j.cub.2019.04.011. Epub 2019 May 2.
Decisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should be sensitive to not only whether the choice itself was suboptimal but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this, we used a modified version of a classic reinforcement learning task in which feedback indicated whether negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful, but reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction errors in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.
决策必须通过行动来实施,而行动容易出错。因此,当没有得到预期的结果时,个体不仅应该敏感地意识到选择本身是否不够理想,还应该意识到指示该选择所需的行动是否成功执行。将信用分配给行动执行与行动选择对于学习者具有明显的生态效用。为了探索这一点,我们使用了经典强化学习任务的修改版本,其中反馈表明负预测误差是否与执行错误相关。使用 fMRI,我们询问在人类纹状体(强化学习和决策的关键基质)中的预测误差计算是否在执行失败导致负面结果时受到调制。当非奖励结果是由于执行错误而不是执行成功但奖励被拒绝时,参与者对其的容忍度更高。与该行为一致,对神经活动的模型驱动分析表明,在执行失败后,纹状体中与负奖励预测误差相关的信号减弱。这些结果与其他表明中脑多巴胺系统中的预测误差在评估即时奖励结果时整合高层信息的证据一致。