Pessiglione Mathias, Seymour Ben, Flandin Guillaume, Dolan Raymond J, Frith Chris D
Wellcome Department of Imaging Neuroscience, 12 Queen Square, London WC1N 3BG, UK.
Nature. 2006 Aug 31;442(7106):1042-5. doi: 10.1038/nature05051. Epub 2006 Aug 23.
Theories of instrumental learning are centred on understanding how success and failure are used to improve future decisions. These theories highlight a central role for reward prediction errors in updating the values associated with available actions. In animals, substantial evidence indicates that the neurotransmitter dopamine might have a key function in this type of learning, through its ability to modulate cortico-striatal synaptic efficacy. However, no direct evidence links dopamine, striatal activity and behavioural choice in humans. Here we show that, during instrumental learning, the magnitude of reward prediction error expressed in the striatum is modulated by the administration of drugs enhancing (3,4-dihydroxy-L-phenylalanine; L-DOPA) or reducing (haloperidol) dopaminergic function. Accordingly, subjects treated with L-DOPA have a greater propensity to choose the most rewarding action relative to subjects treated with haloperidol. Furthermore, incorporating the magnitude of the prediction errors into a standard action-value learning algorithm accurately reproduced subjects' behavioural choices under the different drug conditions. We conclude that dopamine-dependent modulation of striatal activity can account for how the human brain uses reward prediction errors to improve future decisions.
工具性学习理论的核心在于理解成功与失败是如何被用于改进未来决策的。这些理论强调了奖励预测误差在更新与可用行动相关联的价值方面的核心作用。在动物身上,大量证据表明神经递质多巴胺可能在这类学习中具有关键作用,通过其调节皮质 - 纹状体突触效能的能力。然而,在人类中,尚无直接证据将多巴胺、纹状体活动和行为选择联系起来。在此我们表明,在工具性学习过程中,纹状体中表达的奖励预测误差的大小会受到增强(3,4 - 二羟基 - L - 苯丙氨酸;L - 多巴)或降低(氟哌啶醇)多巴胺能功能的药物给药的调节。相应地,与接受氟哌啶醇治疗的受试者相比,接受L - 多巴治疗的受试者更倾向于选择最具奖励性的行动。此外,将预测误差的大小纳入标准行动价值学习算法能够准确重现不同药物条件下受试者的行为选择。我们得出结论,多巴胺对纹状体活动依赖性的调节能够解释人类大脑如何利用奖励预测误差来改进未来决策。