University of California at Berkeley.
J Cogn Neurosci. 2018 Aug;30(8):1197-1208. doi: 10.1162/jocn_a_01272. Epub 2018 Apr 25.
Reinforcement learning models have proven highly effective for understanding learning in both artificial and biological systems. However, these models have difficulty in scaling up to the complexity of real-life environments. One solution is to incorporate the hierarchical structure of behavior. In hierarchical reinforcement learning, primitive actions are chunked together into more temporally abstract actions, called "options," that are reinforced by attaining a subgoal. These subgoals are capable of generating pseudoreward prediction errors, which are distinct from reward prediction errors that are associated with the final goal of the behavior. Studies in humans have shown that pseudoreward prediction errors positively correlate with activation of ACC. To determine how pseudoreward prediction errors are encoded at the single neuron level, we trained two animals to perform a primate version of the task used to generate these errors in humans. We recorded the electrical activity of neurons in ACC during performance of this task, as well as neurons in lateral prefrontal cortex and OFC. We found that the firing rate of a small population of neurons encoded pseudoreward prediction errors, and these neurons were restricted to ACC. Our results provide support for the idea that ACC may play an important role in encoding subgoals and pseudoreward prediction errors to support hierarchical reinforcement learning. One caveat is that neurons encoding pseudoreward prediction errors were relatively few in number, especially in comparison to neurons that encoded information about the main goal of the task.
强化学习模型已被证明在理解人工和生物系统中的学习方面非常有效。然而,这些模型在扩展到真实环境的复杂性方面存在困难。一种解决方案是纳入行为的层次结构。在分层强化学习中,原始动作被组合成更具时间抽象性的动作,称为“选项”,通过实现子目标来加强。这些子目标能够产生伪奖励预测误差,与与行为的最终目标相关的奖励预测误差不同。人类研究表明,伪奖励预测误差与 ACC 的激活呈正相关。为了确定在单个神经元水平上如何对伪奖励预测误差进行编码,我们训练了两只动物来执行一项灵长类任务,该任务用于在人类中产生这些误差。我们记录了在执行此任务期间 ACC 中的神经元的电活动,以及外侧前额叶皮层和 OFC 中的神经元。我们发现一小部分神经元的放电率编码了伪奖励预测误差,这些神经元仅限于 ACC。我们的研究结果为 ACC 可能在编码子目标和伪奖励预测误差以支持分层强化学习方面发挥重要作用提供了支持。一个警告是,编码伪奖励预测误差的神经元数量相对较少,尤其是与编码任务主要目标信息的神经元相比。