灵长类动物眶额皮层对管理探索-开发权衡相关信息的编码。

Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs.

机构信息

Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, Oregon 97239-3098, and

Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892-4415.

出版信息

J Neurosci. 2020 Mar 18;40(12):2553-2561. doi: 10.1523/JNEUROSCI.2355-19.2020. Epub 2020 Feb 14.

DOI:10.1523/JNEUROSCI.2355-19.2020

PMID:32060169

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7083541/

Abstract

Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is managing explore-exploit tradeoffs, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this tradeoff, as well as other RL related variables, in orbitofrontal cortex (OFC) while three male monkeys performed a three-armed bandit learning task. During the task, novel choice options periodically replaced familiar options. The values of the novel options were unknown, and the monkeys had to explore them to see if they were better than other currently available options. The identity of the chosen stimulus and the reward outcome were strongly encoded in the responses of single OFC neurons. These two variables define the states and state transitions in our model that are relevant to decision-making. The chosen value of the option and the relative value of exploring that option were encoded at intermediate levels. We also found that OFC value coding was stimulus specific, as opposed to coding value independent of the identity of the option. The location of the option and the value of the current environment were encoded at low levels. Therefore, we found encoding of the variables relevant to learning and managing explore-exploit tradeoffs in OFC. These results are consistent with findings in the ventral striatum and amygdala and show that this monosynaptically connected network plays an important role in learning based on the immediate and future consequences of choices. Orbitofrontal cortex (OFC) has been implicated in representing the expected values of choices. Here we extend these results and show that OFC also encodes information relevant to managing explore-exploit tradeoffs. Specifically, OFC encodes an exploration bonus, which characterizes the relative value of exploring novel choice options. OFC also strongly encodes the identity of the chosen stimulus, and reward outcomes, which are necessary for computing the value of novel and familiar options.

摘要

强化学习（RL）是指学习获得奖励和避免惩罚的行为过程。RL 的一个重要组成部分是管理探索-利用权衡，这是指在利用具有已知价值的选项和探索不熟悉的选项之间进行选择的问题。我们在三只雄性猴子执行三臂赌博学习任务时，检查了眶额皮层（OFC）中这种权衡的相关性，以及其他与 RL 相关的变量。在任务期间，新的选择选项定期替换熟悉的选项。新选项的价值是未知的，猴子必须探索它们，看看它们是否比其他当前可用的选项更好。选择的刺激和奖励结果在单个 OFC 神经元的反应中被强烈编码。这两个变量定义了我们模型中与决策相关的状态和状态转换。所选选项的价值和探索该选项的相对价值在中间水平上被编码。我们还发现，OFC 的价值编码是特定于刺激的，而不是独立于选项身份的编码。选项的位置和当前环境的价值在低水平编码。因此，我们发现 OFC 中编码了与学习和管理探索-利用权衡相关的变量。这些结果与腹侧纹状体和杏仁核的发现一致，并表明这个单突触连接的网络在基于选择的即时和未来后果的学习中起着重要作用。眶额皮层（OFC）被认为代表了选择的预期价值。在这里，我们扩展了这些结果，并表明 OFC 还编码了与管理探索-利用权衡相关的信息。具体来说，OFC 编码了探索奖金，它描述了探索新选择选项的相对价值。OFC 还强烈编码了所选刺激的身份和奖励结果，这对于计算新的和熟悉的选项的价值是必要的。

相似文献

Primate Orbitofrontal Cortex Codes Information Relevant for Managing Explore-Exploit Tradeoffs.灵长类动物眶额皮层对管理探索-开发权衡相关信息的编码。

J Neurosci. 2020 Mar 18;40(12):2553-2561. doi: 10.1523/JNEUROSCI.2355-19.2020. Epub 2020 Feb 14.

Amygdala Contributions to Stimulus-Reward Encoding in the Macaque Medial and Orbital Frontal Cortex during Learning.学习过程中杏仁核对猕猴内侧和眶额皮质中刺激-奖励编码的贡献。

J Neurosci. 2017 Feb 22;37(8):2186-2202. doi: 10.1523/JNEUROSCI.0933-16.2017. Epub 2017 Jan 25.

The neurocomputational bases of explore-exploit decision-making.探索-利用决策的神经计算基础。

Neuron. 2022 Jun 1;110(11):1869-1879.e5. doi: 10.1016/j.neuron.2022.03.014. Epub 2022 Apr 6.

Differential coding of goals and actions in ventral and dorsal corticostriatal circuits during goal-directed behavior.腹侧和背侧皮质纹状体回路在目标导向行为中对目标和动作的差异编码。

Cell Rep. 2022 Jan 4;38(1):110198. doi: 10.1016/j.celrep.2021.110198.

The Role of Orbitofrontal-Amygdala Interactions in Updating Action-Outcome Valuations in Macaques.眶额皮质与杏仁核相互作用在更新猕猴动作-结果价值评估中的作用

J Neurosci. 2017 Mar 1;37(9):2463-2470. doi: 10.1523/JNEUROSCI.1839-16.2017. Epub 2017 Feb 1.

Subcortical Substrates of Explore-Exploit Decisions in Primates.灵长类动物探索-利用决策的皮质下基质。

Neuron. 2019 Aug 7;103(3):533-545.e5. doi: 10.1016/j.neuron.2019.05.017. Epub 2019 Jun 10.

Motor System-Dependent Effects of Amygdala and Ventral Striatum Lesions on Explore-Exploit Behaviors.杏仁核和腹侧纹状体损伤对探索-利用行为的运动系统依赖性影响。

J Neurosci. 2024 Jan 31;44(5):e1206232023. doi: 10.1523/JNEUROSCI.1206-23.2023.

Effects of amygdala lesions on reward-value coding in orbital and medial prefrontal cortex.杏仁核损伤对眶额皮质和内侧前额叶皮质中奖赏值编码的影响。

Neuron. 2013 Dec 18;80(6):1519-31. doi: 10.1016/j.neuron.2013.09.036.

Partial Adaptation to the Value Range in the Macaque Orbitofrontal Cortex.猴眶额皮质的价值范围的部分适应。

J Neurosci. 2019 May 1;39(18):3498-3513. doi: 10.1523/JNEUROSCI.2279-18.2019. Epub 2019 Mar 4.

Ventral frontostriatal circuitry mediates the computation of reinforcement from symbolic gains and losses.腹侧额眶皮质环路介导了来自符号收益和损失的强化计算。

Neuron. 2024 Nov 20;112(22):3782-3795.e5. doi: 10.1016/j.neuron.2024.08.018. Epub 2024 Sep 24.

引用本文的文献

Rate and noise in human amygdala drive increased exploration in aversive learning.人类杏仁核中的速率和噪声驱动在厌恶学习中增加探索。

Nature. 2025 Aug 27. doi: 10.1038/s41586-025-09466-1.

Home-Cage Training for Non-Human Primates: An Opportunity to Reduce Stress and Study Natural Behavior in Neurophysiology Experiments.非人灵长类动物的笼内训练：在神经生理学实验中减轻压力并研究自然行为的契机。

Animals (Basel). 2025 May 6;15(9):1340. doi: 10.3390/ani15091340.

Reward monitoring in the frontopolar cortex of macaques.猕猴额极皮层中的奖赏监测

Sci Rep. 2025 May 12;15(1):16472. doi: 10.1038/s41598-025-99019-3.

Neural dynamics of semantic control underlying generative storytelling.生成性叙事背后语义控制的神经动力学

Commun Biol. 2025 Mar 28;8(1):513. doi: 10.1038/s42003-025-07913-3.

A subcortical switchboard for perseverative, exploratory and disengaged states.一个用于持续性、探索性和脱离状态的皮质下交换台。

Nature. 2025 May;641(8061):151-161. doi: 10.1038/s41586-025-08672-1. Epub 2025 Mar 5.

Electrophysiological Markers of Aberrant Cue-Specific Exploration in Hazardous Drinkers.危险饮酒者异常线索特异性探索的电生理标志物

Comput Psychiatr. 2023 Jul 28;7(1):47-59. doi: 10.5334/cpsy.96. eCollection 2023.

Preferences reveal dissociable encoding across prefrontal-limbic circuits.偏好揭示了前额叶-边缘回路中可分离的编码。

Neuron. 2024 Jul 3;112(13):2241-2256.e8. doi: 10.1016/j.neuron.2024.03.020. Epub 2024 Apr 18.

Neurons in the monkey frontopolar cortex encode learning stage and goal during a fast learning task.猴子额极前皮质的神经元在快速学习任务中对学习阶段和目标进行编码。

PLoS Biol. 2024 Feb 16;22(2):e3002500. doi: 10.1371/journal.pbio.3002500. eCollection 2024 Feb.

Motor System-Dependent Effects of Amygdala and Ventral Striatum Lesions on Explore-Exploit Behaviors.杏仁核和腹侧纹状体损伤对探索-利用行为的运动系统依赖性影响。

J Neurosci. 2024 Jan 31;44(5):e1206232023. doi: 10.1523/JNEUROSCI.1206-23.2023.

Curiosity: primate neural circuits for novelty and information seeking.好奇心：灵长类动物用于寻求新奇和信息的神经回路。

Nat Rev Neurosci. 2024 Mar;25(3):195-208. doi: 10.1038/s41583-023-00784-9. Epub 2024 Jan 23.

本文引用的文献

Dimensionality, information and learning in prefrontal cortex.前额叶皮层中的维度、信息和学习。

PLoS Comput Biol. 2020 Apr 24;16(4):e1007514. doi: 10.1371/journal.pcbi.1007514. eCollection 2020 Apr.

Orbitofrontal Circuits Control Multiple Reinforcement-Learning Processes.眶额皮质回路控制多种强化学习过程。

Neuron. 2019 Aug 21;103(4):734-746.e3. doi: 10.1016/j.neuron.2019.05.042. Epub 2019 Jun 25.

Subcortical Substrates of Explore-Exploit Decisions in Primates.灵长类动物探索-利用决策的皮质下基质。

Neuron. 2019 Aug 7;103(3):533-545.e5. doi: 10.1016/j.neuron.2019.05.017. Epub 2019 Jun 10.

The Bilateral Prefronto-striatal Pathway Is Necessary for Learning New Goal-Directed Actions.双侧额纹状体通路对学习新的目标导向行为是必要的。

Curr Biol. 2018 Jul 23;28(14):2218-2229.e7. doi: 10.1016/j.cub.2018.05.028. Epub 2018 Jun 28.

Specializations for reward-guided decision-making in the primate ventral prefrontal cortex.灵长类动物腹侧前额叶皮层中奖励导向决策的专业化。

Nat Rev Neurosci. 2018 Jul;19(7):404-417. doi: 10.1038/s41583-018-0013-4.

The Medial Prefrontal Cortex Shapes Dopamine Reward Prediction Errors under State Uncertainty.内侧前额叶皮层在状态不确定下塑造多巴胺奖励预测误差。

Neuron. 2018 May 2;98(3):616-629.e6. doi: 10.1016/j.neuron.2018.03.036. Epub 2018 Apr 12.

A causal role for right frontopolar cortex in directed, but not random, exploration.右侧额极在定向而非随机探索中起因果作用。

Elife. 2017 Sep 15;6:e27430. doi: 10.7554/eLife.27430.

Specialized Representations of Value in the Orbital and Ventrolateral Prefrontal Cortex: Desirability versus Availability of Outcomes.眶额叶和腹外侧前额叶皮质中价值的特殊表征：结果的合意性与可得性

Neuron. 2017 Aug 30;95(5):1208-1220.e5. doi: 10.1016/j.neuron.2017.07.042.

Motivational neural circuits underlying reinforcement learning.强化学习的动机神经回路。

Nat Neurosci. 2017 Mar 29;20(4):505-512. doi: 10.1038/nn.4506.

J Neurosci. 2017 Feb 22;37(8):2186-2202. doi: 10.1523/JNEUROSCI.0933-16.2017. Epub 2017 Jan 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验