Department of Neurobiology, Duke University School of Medicine and Center for Neuroeconomic Studies, Duke University, Durham, NC 27710, USA.
Curr Biol. 2009 Sep 29;19(18):1532-7. doi: 10.1016/j.cub.2009.07.048. Epub 2009 Sep 3.
In dynamic environments, adaptive behavior requires striking a balance between harvesting currently available rewards (exploitation) and gathering information about alternative options (exploration). Such strategic decisions should incorporate not only recent reward history, but also opportunity costs and environmental statistics. Previous neuroimaging and neurophysiological studies have implicated orbitofrontal cortex, anterior cingulate cortex, and ventral striatum in distinguishing between bouts of exploration and exploitation. Nonetheless, the neuronal mechanisms that underlie strategy selection remain poorly understood. We hypothesized that posterior cingulate cortex (CGp), an area linking reward processing, attention, memory, and motor control systems, mediates the integration of variables such as reward, uncertainty, and target location that underlie this dynamic balance. Here we show that CGp neurons distinguish between exploratory and exploitative decisions made by monkeys in a dynamic foraging task. Moreover, firing rates of these neurons predict in graded fashion the strategy most likely to be selected on upcoming trials. This encoding is distinct from switching between targets and is independent of the absolute magnitudes of rewards. These observations implicate CGp in the integration of individual outcomes across decision making and the modification of strategy in dynamic environments.
在动态环境中,适应性行为需要在收获当前可用奖励(利用)和收集替代选项信息(探索)之间取得平衡。这种策略决策不仅应包括最近的奖励历史,还应包括机会成本和环境统计信息。先前的神经影像学和神经生理学研究表明,眶额皮质、前扣带皮质和腹侧纹状体在区分探索和利用阶段方面发挥着重要作用。尽管如此,策略选择的神经机制仍知之甚少。我们假设后扣带皮质(CGp),一个连接奖励处理、注意力、记忆和运动控制系统的区域,介导了奖励、不确定性和目标位置等变量的整合,这些变量是这种动态平衡的基础。在这里,我们发现 CGp 神经元可以区分猴子在动态觅食任务中做出的探索性和利用性决策。此外,这些神经元的放电率以分级的方式预测在即将到来的试验中最有可能被选择的策略。这种编码与目标之间的切换不同,并且与奖励的绝对大小无关。这些观察结果表明 CGp 参与了跨决策的个体结果的整合以及在动态环境中策略的修改。