MRC Cognition and Brain Sciences Unit, University of Cambridge, CB2 7EF, Cambridge, United Kingdom.
Department of Experimental Psychology, University of Oxford, OX2 6GG, Oxford, United Kingdom.
J Neurosci. 2022 Jan 12;42(2):276-287. doi: 10.1523/JNEUROSCI.1338-21.2021. Epub 2021 Nov 15.
Much animal learning is slow, with cumulative changes in behavior driven by reward prediction errors. When the abstract structure of a problem is known, however, both animals and formal learning models can rapidly attach new items to their roles within this structure, sometimes in a single trial. Frontal cortex is likely to play a key role in this process. To examine information seeking and use in a known problem structure, we trained monkeys in an explore/exploit task, requiring the animal first to test objects for their association with reward, then, once rewarded objects were found, to reselect them on further trials for further rewards. Many cells in the frontal cortex showed an explore/exploit preference aligned with one-shot learning in the monkeys' behavior: the population switched from an explore state to an exploit state after a single trial of learning but partially maintained the explore state if an error indicated that learning had failed. Binary switch from explore to exploit was not explained by continuous changes linked to expectancy or prediction error. Explore/exploit preferences were independent for two stages of the trial: object selection and receipt of feedback. Within an established task structure, frontal activity may control the separate processes of explore and exploit, switching in one trial between the two. Much animal learning is slow, with cumulative changes in behavior driven by reward prediction errors. When the abstract structure a problem is known, however, both animals and formal learning models can rapidly attach new items to their roles within this structure. To address transitions in neural activity during one-shot learning, we trained monkeys in an explore/exploit task using familiar objects and a highly familiar task structure. When learning was rapid, many frontal neurons showed a binary, one-shot switch between explore and exploit. Within an established task structure, frontal activity may control the separate operations of exploring alternative objects to establish their current role, then exploiting this knowledge for further reward.
许多动物的学习过程较为缓慢,其行为的累积变化是由奖励预测误差驱动的。然而,当问题的抽象结构已知时,动物和形式学习模型都可以迅速将新的项目纳入到该结构中。前额皮质可能在这个过程中发挥关键作用。为了在已知的问题结构中研究信息的寻求和使用,我们在探索/利用任务中训练猴子,要求动物首先测试物体与奖励的关联,然后,一旦发现奖励物体,就进一步选择它们以获得进一步的奖励。前额皮质中有许多细胞表现出与猴子行为中单次学习一致的探索/利用偏好:在学习的单次试验后,群体从探索状态切换到利用状态,但如果错误表明学习失败,则部分维持探索状态。从探索到利用的二进制切换不能用与期望或预测误差相关的连续变化来解释。探索/利用偏好与试验的两个阶段无关:物体选择和反馈接收。在既定的任务结构内,前额活动可能控制探索和利用的两个独立过程,在一次试验中在两者之间切换。许多动物的学习过程较为缓慢,其行为的累积变化是由奖励预测误差驱动的。然而,当问题的抽象结构已知时,动物和形式学习模型都可以迅速将新的项目纳入到该结构中。为了在已知的问题结构中研究信息的寻求和使用,我们在探索/利用任务中训练猴子,要求动物首先测试物体与奖励的关联,然后,一旦发现奖励物体,就进一步选择它们以获得进一步的奖励。前额皮质中有许多细胞表现出与猴子行为中单次学习一致的探索/利用偏好:在学习的单次试验后,群体从探索状态切换到利用状态,但如果错误表明学习失败,则部分维持探索状态。从探索到利用的二进制切换不能用与期望或预测误差相关的连续变化来解释。探索/利用偏好与试验的两个阶段无关:物体选择和反馈接收。在既定的任务结构内,前额活动可能控制探索和利用的两个独立过程,在一次试验中在两者之间切换。