Department of Physiology, Kyoto Prefectural University of Medicine, Kawaramachi-Hirokoji, Kamigyo-ku, Kyoto 602-8566, Japan.
Eur J Neurosci. 2011 Aug;34(3):489-506. doi: 10.1111/j.1460-9568.2011.07771.x. Epub 2011 Jul 22.
Humans and animals optimize their behavior by evaluating outcomes of individual actions and predicting how much reward the actions will yield. While the estimated values of actions guide choice behavior, the choices are also governed by other behavioral norms, such as rules and strategies. Values, rules and strategies are represented in neuronal activity, and the striatum is one of the best qualified brain loci where these signals meet. To understand the role of the striatum in value- and strategy-based decision-making, we recorded striatal neurons in macaque monkeys performing a behavioral task in which they searched for a reward target by trial-and-error among three alternatives, earned a reward for a target choice, and then earned additional rewards for choosing the same target. This task allowed us to examine whether and how values of targets and strategy, which were defined as negative-then-search and positive-then-repeat (or win-stay-lose-switch), are represented in the striatum. Large subsets of striatal neurons encoded positive and negative outcome feedbacks of individual decisions and actions. Once monkeys made a choice, signals related to chosen actions, their values and search- or repeat-type actions increased and persisted until the outcome feedback appeared. Subsets of neurons exhibited a tonic increase in activity after the search- and repeat-choices following negative and positive feedback in the last trials as the task strategy monkeys adapted. These activity profiles as a heterogeneous representation of decision variables may underlie a part of the process for reinforcement- and strategy-based evaluation of selected actions in the striatum.
人类和动物通过评估个体行为的结果并预测这些行为将产生多少奖励来优化行为。虽然行为的估计值指导选择行为,但选择也受到其他行为规范的制约,如规则和策略。价值、规则和策略在神经元活动中得到体现,纹状体是这些信号交汇的最佳脑区之一。为了理解纹状体在基于价值和策略的决策中的作用,我们记录了猕猴纹状体神经元在一项行为任务中的活动,在这项任务中,它们通过试错在三个选项中寻找奖励目标,选择目标可获得奖励,然后选择相同的目标可获得额外奖励。这项任务使我们能够检验目标和策略的价值,以及策略(即先否定再搜索,或先肯定再重复(或赢则保持,输则转换))是否以及如何在纹状体中得到体现。大量的纹状体神经元编码了个体决策和行为的正、负结果反馈。一旦猴子做出选择,与所选动作及其价值和搜索或重复动作相关的信号就会增加并持续,直到出现结果反馈。在最后几轮试验中,在正、负反馈之后,当猴子适应任务策略时,部分神经元会出现持续的兴奋增加,表现为搜索和重复选择后的兴奋增加。这些活动模式作为决策变量的异质表现,可能是强化和策略评估所选行为的过程的一部分。