Suppr超能文献

人类强化学习通过学习特定效应器的值来细分结构化动作空间。

Human reinforcement learning subdivides structured action spaces by learning effector-specific values.

作者信息

Gershman Samuel J, Pesaran Bijan, Daw Nathaniel D

机构信息

Center for Neural Science, New York University, New York, New York 10003, USA.

出版信息

J Neurosci. 2009 Oct 28;29(43):13524-31. doi: 10.1523/JNEUROSCI.2469-09.2009.

Abstract

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning-such as prediction error signals for action valuation associated with dopamine and the striatum-can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

摘要

人类和动物具有大量的效应器。虽然这使得行为具有很大的灵活性,但由于动作空间的高维度性,这也带来了一个同样艰巨的强化学习问题,即发现哪些动作最有价值。一个尚未解决的问题是,强化学习的神经系统——比如与多巴胺和纹状体相关的用于动作评估的预测误差信号——如何应对这种“维度诅咒”。我们提出了一个强化学习框架,该框架允许在适合任务时将学习到的动作评估分解为特定效应器的组件,并通过研究人类行为和血氧水平依赖(BOLD)活动在多效应器选择任务中能在多大程度上利用这种分解来对其进行测试。受试者用左手和右手同时做出决策,并为每只手的动作分别获得奖励反馈。我们发现,与将双手动作视为具有单一值的统一体的传统模型相比,一个将双手动作的值分解为每个效应器的单独值的学习模型能更好地描述选择行为。在与价值相关的BOLD信号中也观察到了价值分解为特定效应器组件的情况,表现为纹状体中预测误差的相关性以及顶内沟中预期价值相关性的偏侧化。这些结果表明,人类大脑可以使用分解后的价值表征来“分而治之”高维度动作空间中的强化学习。

相似文献

3
Action selection in multi-effector decision making.多效应器决策中的动作选择。
Neuroimage. 2013 Apr 15;70:66-79. doi: 10.1016/j.neuroimage.2012.12.001. Epub 2012 Dec 7.
4
Generalization of value in reinforcement learning by humans.人类在强化学习中的价值泛化。
Eur J Neurosci. 2012 Apr;35(7):1092-104. doi: 10.1111/j.1460-9568.2012.08017.x.
8
How instructed knowledge modulates the neural systems of reward learning.指导知识如何调节奖励学习的神经系统。
Proc Natl Acad Sci U S A. 2011 Jan 4;108(1):55-60. doi: 10.1073/pnas.1014938108. Epub 2010 Dec 20.
10
Causal Inference Gates Corticostriatal Learning.因果推理门控皮质纹状体学习。
J Neurosci. 2021 Aug 11;41(32):6892-6904. doi: 10.1523/JNEUROSCI.2796-20.2021. Epub 2021 Jul 9.

引用本文的文献

本文引用的文献

6
Reinforcement learning: the good, the bad and the ugly.强化学习:优点、缺点与不足。
Curr Opin Neurobiol. 2008 Apr;18(2):185-96. doi: 10.1016/j.conb.2008.08.003. Epub 2008 Aug 22.
7
The discovery of structural form.结构形式的发现
Proc Natl Acad Sci U S A. 2008 Aug 5;105(31):10687-92. doi: 10.1073/pnas.0802631105. Epub 2008 Jul 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验