• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Human reinforcement learning subdivides structured action spaces by learning effector-specific values.人类强化学习通过学习特定效应器的值来细分结构化动作空间。
J Neurosci. 2009 Oct 28;29(43):13524-31. doi: 10.1523/JNEUROSCI.2469-09.2009.
2
A reinforcement learning mechanism responsible for the valuation of free choice.一种负责自由选择估值的强化学习机制。
Neuron. 2014 Aug 6;83(3):551-7. doi: 10.1016/j.neuron.2014.06.035. Epub 2014 Jul 24.
3
Action selection in multi-effector decision making.多效应器决策中的动作选择。
Neuroimage. 2013 Apr 15;70:66-79. doi: 10.1016/j.neuroimage.2012.12.001. Epub 2012 Dec 7.
4
Generalization of value in reinforcement learning by humans.人类在强化学习中的价值泛化。
Eur J Neurosci. 2012 Apr;35(7):1092-104. doi: 10.1111/j.1460-9568.2012.08017.x.
5
Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.神经预测误差揭示了人类大脑中风险敏感的强化学习过程。
J Neurosci. 2012 Jan 11;32(2):551-62. doi: 10.1523/JNEUROSCI.5498-10.2012.
6
Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices.纹状体和腹内侧前额叶皮层中的多巴胺介导的强化学习信号是基于价值的选择的基础。
J Neurosci. 2011 Feb 2;31(5):1606-13. doi: 10.1523/JNEUROSCI.3904-10.2011.
7
Signals in human striatum are appropriate for policy update rather than value prediction.人类纹状体中的信号适合用于策略更新,而不是价值预测。
J Neurosci. 2011 Apr 6;31(14):5504-11. doi: 10.1523/JNEUROSCI.6316-10.2011.
8
How instructed knowledge modulates the neural systems of reward learning.指导知识如何调节奖励学习的神经系统。
Proc Natl Acad Sci U S A. 2011 Jan 4;108(1):55-60. doi: 10.1073/pnas.1014938108. Epub 2010 Dec 20.
9
Navigating complex decision spaces: Problems and paradigms in sequential choice.导航复杂决策空间:序列选择中的问题和范式。
Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.
10
Causal Inference Gates Corticostriatal Learning.因果推理门控皮质纹状体学习。
J Neurosci. 2021 Aug 11;41(32):6892-6904. doi: 10.1523/JNEUROSCI.2796-20.2021. Epub 2021 Jul 9.

引用本文的文献

1
A feature-specific prediction error model explains dopaminergic heterogeneity.一种具有特征特异性的预测误差模型解释了多巴胺能异质性。
Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.
2
Surprise-minimization as a solution to the structural credit assignment problem.将惊喜最小化作为解决结构性信用分配问题的一种方法。
PLoS Comput Biol. 2024 May 28;20(5):e1012175. doi: 10.1371/journal.pcbi.1012175. eCollection 2024 May.
3
Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts.主动强化学习与动作偏差和滞后的比较:混合专家与非专家的控制。
PLoS Comput Biol. 2024 Mar 29;20(3):e1011950. doi: 10.1371/journal.pcbi.1011950. eCollection 2024 Mar.
4
Value Estimation versus Effort Mobilization: A General Dissociation between Ventromedial and Dorsomedial Prefrontal Cortex.价值估计与努力调动:腹内侧前额叶皮层和背内侧前额叶皮层之间的一般分离
J Neurosci. 2024 Apr 24;44(17):e1176232024. doi: 10.1523/JNEUROSCI.1176-23.2024.
5
Decision-making processes in perceptual learning depend on effectors.在知觉学习中,决策过程取决于效应器。
Sci Rep. 2024 Mar 7;14(1):5644. doi: 10.1038/s41598-024-55508-5.
6
Neural and computational underpinnings of biased confidence in human reinforcement learning.人类强化学习中偏见信心的神经和计算基础。
Nat Commun. 2023 Oct 28;14(1):6896. doi: 10.1038/s41467-023-42589-5.
7
Having multiple selves helps learning agents explore and adapt in complex changing worlds.拥有多个自我有助于学习代理在复杂多变的世界中探索和适应。
Proc Natl Acad Sci U S A. 2023 Jul 11;120(28):e2221180120. doi: 10.1073/pnas.2221180120. Epub 2023 Jul 3.
8
The motivational role of the ventral striatum and amygdala in learning from gains and losses.腹侧纹状体和杏仁核在从得失中学习的激励作用。
Behav Neurosci. 2023 Aug;137(4):268-280. doi: 10.1037/bne0000558. Epub 2023 May 4.
9
Individuals with problem gambling and obsessive-compulsive disorder learn through distinct reinforcement mechanisms.有问题赌博和强迫症的个体通过不同的强化机制来学习。
PLoS Biol. 2023 Mar 14;21(3):e3002031. doi: 10.1371/journal.pbio.3002031. eCollection 2023 Mar.
10
Neuroprotection in late life attention-deficit/hyperactivity disorder: A review of pharmacotherapy and phenotype across the lifespan.老年期注意缺陷多动障碍的神经保护:全生命周期药物治疗与表型综述
Front Hum Neurosci. 2022 Sep 26;16:938501. doi: 10.3389/fnhum.2022.938501. eCollection 2022.

本文引用的文献

1
How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action.对岸的草有多绿?额极皮质与支持其他行动方案的证据。
Neuron. 2009 Jun 11;62(5):733-43. doi: 10.1016/j.neuron.2009.05.014.
2
Circular analysis in systems neuroscience: the dangers of double dipping.系统神经科学中的循环分析:二次利用数据的风险。
Nat Neurosci. 2009 May;12(5):535-40. doi: 10.1038/nn.2303.
3
Behavioral and neural changes after gains and losses of conditioned reinforcers.条件性强化物得失后的行为和神经变化。
J Neurosci. 2009 Mar 18;29(11):3627-41. doi: 10.1523/JNEUROSCI.4726-08.2009.
4
Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli.有害刺激对腹侧腹侧被盖区多巴胺神经元的阶段性兴奋作用。
Proc Natl Acad Sci U S A. 2009 Mar 24;106(12):4894-9. doi: 10.1073/pnas.0811507106. Epub 2009 Mar 4.
5
Single nigrostriatal dopaminergic neurons form widely spread and highly dense axonal arborizations in the neostriatum.单个黑质纹状体多巴胺能神经元在新纹状体内形成广泛分布且高度密集的轴突分支。
J Neurosci. 2009 Jan 14;29(2):444-53. doi: 10.1523/JNEUROSCI.4029-08.2009.
6
Reinforcement learning: the good, the bad and the ugly.强化学习:优点、缺点与不足。
Curr Opin Neurobiol. 2008 Apr;18(2):185-96. doi: 10.1016/j.conb.2008.08.003. Epub 2008 Aug 22.
7
The discovery of structural form.结构形式的发现
Proc Natl Acad Sci U S A. 2008 Aug 5;105(31):10687-92. doi: 10.1073/pnas.0802631105. Epub 2008 Jul 31.
8
Determining a role for ventromedial prefrontal cortex in encoding action-based value signals during reward-related decision making.确定腹内侧前额叶皮层在奖励相关决策过程中编码基于动作的价值信号方面的作用。
Cereb Cortex. 2009 Feb;19(2):483-95. doi: 10.1093/cercor/bhn098. Epub 2008 Jun 11.
9
Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors.区分眶额皮质和纹状体在目标价值计算和预测误差中的作用。
J Neurosci. 2008 May 28;28(22):5623-30. doi: 10.1523/JNEUROSCI.1309-08.2008.
10
Value representations in the primate striatum during matching behavior.灵长类动物在匹配行为期间纹状体中的价值表征。
Neuron. 2008 May 8;58(3):451-63. doi: 10.1016/j.neuron.2008.02.021.

人类强化学习通过学习特定效应器的值来细分结构化动作空间。

Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

作者信息

Gershman Samuel J, Pesaran Bijan, Daw Nathaniel D

机构信息

Center for Neural Science, New York University, New York, New York 10003, USA.

出版信息

J Neurosci. 2009 Oct 28;29(43):13524-31. doi: 10.1523/JNEUROSCI.2469-09.2009.

DOI:10.1523/JNEUROSCI.2469-09.2009
PMID:19864565
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2796632/
Abstract

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

摘要

人类和动物具有大量的效应器。虽然这使得行为具有很大的灵活性,但由于动作空间的高维度性,这也带来了一个同样艰巨的强化学习问题,即发现哪些动作最有价值。一个尚未解决的问题是,强化学习的神经系统——比如与多巴胺和纹状体相关的用于动作评估的预测误差信号——如何应对这种“维度诅咒”。我们提出了一个强化学习框架,该框架允许在适合任务时将学习到的动作评估分解为特定效应器的组件,并通过研究人类行为和血氧水平依赖(BOLD)活动在多效应器选择任务中能在多大程度上利用这种分解来对其进行测试。受试者用左手和右手同时做出决策,并为每只手的动作分别获得奖励反馈。我们发现,与将双手动作视为具有单一值的统一体的传统模型相比,一个将双手动作的值分解为每个效应器的单独值的学习模型能更好地描述选择行为。在与价值相关的BOLD信号中也观察到了价值分解为特定效应器组件的情况,表现为纹状体中预测误差的相关性以及顶内沟中预期价值相关性的偏侧化。这些结果表明,人类大脑可以使用分解后的价值表征来“分而治之”高维度动作空间中的强化学习。