Seo Hyojung, Barraclough Dominic J, Lee Daeyeol
Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06510, USA.
Cereb Cortex. 2007 Sep;17 Suppl 1:i110-7. doi: 10.1093/cercor/bhm064. Epub 2007 Jun 4.
Although economic theories based on utility maximization account for a range of choice behaviors, utilities must be estimated through experience. Dynamics of this learning process may account for certain discrepancies between the predictions of economic theories and real choice behaviors of humans and other animals. To understand the neural mechanisms responsible for such adaptive decision making, we trained rhesus monkeys to play a simulated matching pennies game. Small but systematic deviations of the animal's behavior from the optimal strategy were consistent with the predictions of reinforcement learning theory. In addition, individual neurons in the dorsolateral prefrontal cortex (DLPFC) encoded 3 different types of signals that can potentially influence the animal's future choices. First, activity modulated by the animal's previous choices might provide the eligibility trace that can be used to attribute a particular outcome to its causative action. Second, activity related to the animal's rewards in the previous trials might be used to compute an average reward rate. Finally, activity of some neurons was modulated by the computer's choices in the previous trials and may reflect the process of updating the value functions. These results suggest that the DLPFC might be an important node in the cortical network of decision making.
尽管基于效用最大化的经济理论解释了一系列选择行为,但效用必须通过经验来估计。这种学习过程的动态变化可能解释了经济理论预测与人类和其他动物实际选择行为之间的某些差异。为了理解负责这种适应性决策的神经机制,我们训练恒河猴玩模拟的猜硬币游戏。动物行为与最优策略之间微小但系统的偏差与强化学习理论的预测一致。此外,背外侧前额叶皮层(DLPFC)中的单个神经元编码了3种不同类型的信号,这些信号可能会影响动物未来的选择。首先,由动物先前选择调制的活动可能提供了资格痕迹,可用于将特定结果归因于其因果行为。其次,与动物先前试验中的奖励相关的活动可能用于计算平均奖励率。最后,一些神经元的活动受到计算机先前试验中的选择调制,可能反映了价值函数更新的过程。这些结果表明,DLPFC可能是决策皮层网络中的一个重要节点。