• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

顶叶中时间信用分配的神经关联。

Neural correlates of temporal credit assignment in the parietal lobe.

作者信息

Gersch Timothy M, Foley Nicholas C, Eisenberg Ian, Gottlieb Jacqueline

机构信息

Department of Neuroscience, Columbia University, New York, New York, United States of America.

Department of Neuroscience, Columbia University, New York, New York, United States of America ; The Kavli Institute for Brain Science Columbia University, New York, New York, United States of America.

出版信息

PLoS One. 2014 Feb 11;9(2):e88725. doi: 10.1371/journal.pone.0088725. eCollection 2014.

DOI:10.1371/journal.pone.0088725
PMID:24523935
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3921206/
Abstract

Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the "F" step) but ignore changes in this reward at the remaining step (the "I" step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

摘要

决策的实证研究通常假定价值学习受时间支配,即特定时间出现的奖励预测误差会触发对所有先前行动的时间折扣学习。然而,在自然行为中,目标必须通过多个行动来实现,且每个行动对最终结果可能具有不同的意义。正如在计算研究中所认识到的,执行多步行动需要使用信用分配机制,将学习聚焦于特定步骤,但对于这些机制的神经关联知之甚少。为了研究这个问题,我们在一个串行决策任务中记录了猴子外侧顶内区(LIP)的神经元,在该任务中,两个连续的眼动决策会导致最终奖励。潜在的决策树结构使得这两个决策与最终奖励具有不同的关系,并且最优策略是在其中一个步骤(“F”步骤)基于最终奖励进行学习,而在其余步骤(“I”步骤)忽略该奖励的变化。在两种不同的情境中,F步骤要么是序列中的第一个,要么是第二个,以控制时间折扣的影响。我们发现,无论F步骤在序列中的位置如何,LIP神经元在F步骤之后的转换期间具有最强的价值学习和最强的决策后反应。因此,这些神经元编码了时间信用分配机制的关联物,该机制将学习独立于时间折扣分配到特定步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/5c7c387590ea/pone.0088725.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/1598f45672e1/pone.0088725.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/e2c4e73556d6/pone.0088725.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/f0cf326b1f61/pone.0088725.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/75c2da940669/pone.0088725.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/c78a97e7e021/pone.0088725.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/9e2b662033f8/pone.0088725.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/5c7c387590ea/pone.0088725.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/1598f45672e1/pone.0088725.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/e2c4e73556d6/pone.0088725.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/f0cf326b1f61/pone.0088725.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/75c2da940669/pone.0088725.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/c78a97e7e021/pone.0088725.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/9e2b662033f8/pone.0088725.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc20/3921206/5c7c387590ea/pone.0088725.g007.jpg

相似文献

1
Neural correlates of temporal credit assignment in the parietal lobe.顶叶中时间信用分配的神经关联。
PLoS One. 2014 Feb 11;9(2):e88725. doi: 10.1371/journal.pone.0088725. eCollection 2014.
2
Reward-based decision signals in parietal cortex are partially embodied.顶叶皮质中基于奖励的决策信号部分具身化。
J Neurosci. 2015 Mar 25;35(12):4869-81. doi: 10.1523/JNEUROSCI.4618-14.2015.
3
Integration of sensory and reward information during perceptual decision-making in lateral intraparietal cortex (LIP) of the macaque monkey.在猕猴的外侧顶内沟皮层(LIP)中,感觉和奖励信息在知觉决策中的整合。
PLoS One. 2010 Feb 19;5(2):e9308. doi: 10.1371/journal.pone.0009308.
4
Parietal neurons encode information sampling based on decision uncertainty.顶叶神经元根据决策不确定性对信息进行编码。
Nat Neurosci. 2019 Aug;22(8):1327-1335. doi: 10.1038/s41593-019-0440-1. Epub 2019 Jul 8.
5
Parietal neurons encode expected gains in instrumental information.顶叶神经元对工具信息的预期增益进行编码。
Proc Natl Acad Sci U S A. 2017 Apr 18;114(16):E3315-E3323. doi: 10.1073/pnas.1613844114. Epub 2017 Apr 3.
6
Correlates of reward-predictive value in learning-related hippocampal neural activity.学习相关海马体神经活动中奖励预测价值的相关因素。
Hippocampus. 2009 May;19(5):487-506. doi: 10.1002/hipo.20535.
7
Neural correlates of decision variables in parietal cortex.顶叶皮质中决策变量的神经关联
Nature. 1999 Jul 15;400(6741):233-8. doi: 10.1038/22268.
8
Lateral intraparietal cortex and reinforcement learning during a mixed-strategy game.混合策略游戏中的顶内沟外侧皮质与强化学习
J Neurosci. 2009 Jun 3;29(22):7278-89. doi: 10.1523/JNEUROSCI.1479-09.2009.
9
Separating value from choice: delay discounting activity in the lateral intraparietal area.从选择中分离价值:顶内沟外侧部的延迟折扣活动。
J Neurosci. 2010 Apr 21;30(16):5498-507. doi: 10.1523/JNEUROSCI.5742-09.2010.
10
Frontal eye field neurons selectively signal the reward value of prior actions.额眼区神经元选择性地对先前动作的奖励价值进行信号传递。
Prog Neurobiol. 2020 Dec;195:101881. doi: 10.1016/j.pneurobio.2020.101881. Epub 2020 Jul 3.

引用本文的文献

1
Trial-by-trial learning of successor representations in human behavior.人类行为中后继表征的逐次试验学习。
bioRxiv. 2025 Jun 16:2024.11.07.622528. doi: 10.1101/2024.11.07.622528.
2
A neural mechanism for learning from delayed postingestive feedback.一种从延迟的摄食后反馈中学习的神经机制。
bioRxiv. 2024 Sep 19:2023.10.06.561214. doi: 10.1101/2023.10.06.561214.
3
Choice-selective sequences dominate in cortical relative to thalamic inputs to NAc to support reinforcement learning.皮层对 NAc 的输入比丘脑的输入更具有选择选择性,从而支持强化学习。

本文引用的文献

1
Signals for previous goal choice persist in the dorsomedial, but not dorsolateral striatum of rats.大鼠的背内侧纹状体中存在先前目标选择的信号,但背外侧纹状体中没有。
J Neurosci. 2013 Jan 2;33(1):52-63. doi: 10.1523/JNEUROSCI.2422-12.2013.
2
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.
3
Rational regulation of learning dynamics by pupil-linked arousal systems.通过与瞳孔相关的觉醒系统对学习动力学进行理性调节。
Cell Rep. 2022 May 17;39(7):110756. doi: 10.1016/j.celrep.2022.110756.
4
Reward uncertainty asymmetrically affects information transmission within the monkey fronto-parietal network.奖励不确定性会使猴子的额顶网络中的信息传递呈现不对称性。
Commun Biol. 2020 Oct 21;3(1):594. doi: 10.1038/s42003-020-01320-6.
5
Solving the Credit Assignment Problem With the Prefrontal Cortex.利用前额叶皮层解决信用分配问题。
Front Neurosci. 2018 Mar 27;12:182. doi: 10.3389/fnins.2018.00182. eCollection 2018.
6
Prefrontal Neurons Encode a Solution to the Credit-Assignment Problem.前额叶神经元编码信用分配问题的解决方案。
J Neurosci. 2017 Jul 19;37(29):6995-7007. doi: 10.1523/JNEUROSCI.3311-16.2017. Epub 2017 Jun 20.
7
A Neural Mechanism for Sensing and Reproducing a Time Interval.一种感知和重现时间间隔的神经机制。
Curr Biol. 2015 Oct 19;25(20):2599-609. doi: 10.1016/j.cub.2015.08.038. Epub 2015 Oct 8.
8
Planning activity for internally generated reward goals in monkey amygdala neurons.对猴子杏仁核神经元内部产生的奖励目标进行规划活动。
Nat Neurosci. 2015 Mar;18(3):461-9. doi: 10.1038/nn.3925. Epub 2015 Jan 26.
Nat Neurosci. 2012 Jun 3;15(7):1040-6. doi: 10.1038/nn.3130.
4
Neural basis of reinforcement learning and decision making.强化学习和决策的神经基础。
Annu Rev Neurosci. 2012;35:287-308. doi: 10.1146/annurev-neuro-062111-150512. Epub 2012 Mar 29.
5
Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex.与期望相关的多巴胺神经元放电变化取决于眶额皮质。
Nat Neurosci. 2011 Oct 30;14(12):1590-7. doi: 10.1038/nn.2957.
6
Credit assignment in multiple goal embodied visuomotor behavior.多目标具身视动行为中的信用分配。
Front Psychol. 2010 Nov 22;1:173. doi: 10.3389/fpsyg.2010.00173. eCollection 2010.
7
Reward value-based gain control: divisive normalization in parietal cortex.基于奖赏值的增益控制:顶叶皮层的分歧归一化。
J Neurosci. 2011 Jul 20;31(29):10627-39. doi: 10.1523/JNEUROSCI.1237-11.2011.
8
A reservoir of time constants for memory traces in cortical neurons.皮质神经元记忆痕迹的时间常数库。
Nat Neurosci. 2011 Mar;14(3):366-72. doi: 10.1038/nn.2752. Epub 2011 Feb 13.
9
Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings.风险、意外不确定性和估计不确定性:不稳定环境下的贝叶斯学习。
PLoS Comput Biol. 2011 Jan 20;7(1):e1001048. doi: 10.1371/journal.pcbi.1001048.
10
Separating value from choice: delay discounting activity in the lateral intraparietal area.从选择中分离价值:顶内沟外侧部的延迟折扣活动。
J Neurosci. 2010 Apr 21;30(16):5498-507. doi: 10.1523/JNEUROSCI.5742-09.2010.