多维环境中的强化学习依赖于注意力机制。

Reinforcement learning in multidimensional environments relies on attention mechanisms.

作者信息

Niv Yael, Daniel Reka, Geana Andra, Gershman Samuel J, Leong Yuan Chang, Radulescu Angela, Wilson Robert C

机构信息

Department of Psychology and Neuroscience Institute, Princeton University, Princeton, New Jersey 08540,

Department of Psychology and Neuroscience Institute, Princeton University, Princeton, New Jersey 08540.

出版信息

J Neurosci. 2015 May 27;35(21):8145-57. doi: 10.1523/JNEUROSCI.2978-14.2015.

DOI:10.1523/JNEUROSCI.2978-14.2015

PMID:26019331

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4444538/

Abstract

In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning.

摘要

近年来，强化学习这一计算领域的理念彻底改变了大脑学习的研究，尤其为多巴胺如何影响基底神经节中的学习提供了全新且精确的理论。然而，强化学习算法因无法很好地扩展到多维环境而声名狼藉，而现实世界中的学习恰恰需要这种多维环境。我们推测，大脑会自然地将现实世界问题的维度降低到仅与预测奖励相关的那些维度，并进行了一项实验，以评估人类通过何种算法以及何种神经机制来实现这种“表征学习”过程。我们的研究结果表明，一个由顶内沟、楔前叶和背外侧前额叶皮层组成的双侧注意力控制网络参与选择与手头任务相关的维度，通过反复试验有效地更新任务表征。通过这种方式，皮层注意力机制与基底神经节中的学习相互作用，以解决强化学习中的“维度诅咒”问题。

相似文献

Reinforcement learning in multidimensional environments relies on attention mechanisms.多维环境中的强化学习依赖于注意力机制。

J Neurosci. 2015 May 27;35(21):8145-57. doi: 10.1523/JNEUROSCI.2978-14.2015.

Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments.多维环境中强化学习与注意力之间的动态交互

Neuron. 2017 Jan 18;93(2):451-463. doi: 10.1016/j.neuron.2016.12.040.

Neural Mechanisms for Undoing the "Curse of Dimensionality".消除“维度灾难”的神经机制

J Neurosci. 2015 Sep 2;35(35):12083-4. doi: 10.1523/JNEUROSCI.2428-15.2015.

Intact Reinforcement Learning But Impaired Attentional Control During Multidimensional Probabilistic Learning in Older Adults.老年人在多维概率学习中表现出完整的强化学习能力但注意力控制受损。

J Neurosci. 2020 Jan 29;40(5):1084-1096. doi: 10.1523/JNEUROSCI.0254-19.2019. Epub 2019 Dec 11.

Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

The effects of aging on the interaction between reinforcement learning and attention.衰老对强化学习与注意力之间相互作用的影响。

Psychol Aging. 2016 Nov;31(7):747-757. doi: 10.1037/pag0000112. Epub 2016 Sep 5.

Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.注意力选择可以通过对与任务相关的刺激特征进行强化学习来预测，这些特征由与价值无关的粘性加权。

J Cogn Neurosci. 2016 Feb;28(2):333-49. doi: 10.1162/jocn_a_00894. Epub 2015 Oct 21.

J Cogn Neurosci. 2004 Apr;16(3):463-78. doi: 10.1162/089892904322926791.

A neurocomputational model of dopamine and prefrontal-striatal interactions during multicue category learning by Parkinson patients.帕金森病患者在多线索类别学习过程中多巴胺和前额叶-纹状体相互作用的神经计算模型。

J Cogn Neurosci. 2011 Jan;23(1):151-67. doi: 10.1162/jocn.2010.21420.

Goal-Directed and Habit-Like Modulations of Stimulus Processing during Reinforcement Learning.强化学习过程中刺激处理的目标导向与习惯样调制

J Neurosci. 2017 Mar 15;37(11):3009-3017. doi: 10.1523/JNEUROSCI.3205-16.2017. Epub 2017 Feb 13.

引用本文的文献

How working memory and reinforcement learning interact when avoiding punishment and pursuing reward concurrently.当同时避免惩罚和追求奖励时，工作记忆与强化学习是如何相互作用的。

J Exp Psychol Gen. 2025 Sep 1. doi: 10.1037/xge0001817.

Feature-based reward learning shapes human social learning strategies.基于特征的奖励学习塑造人类社会学习策略。

Nat Hum Behav. 2025 Jul 23. doi: 10.1038/s41562-025-02269-4.

Striatal Gradient in Value-Decay Explains Regional Differences in Dopamine Patterns and Reinforcement Learning Computations.纹状体价值衰减梯度解释了多巴胺模式和强化学习计算中的区域差异。

J Neurosci. 2025 Jul 18. doi: 10.1523/JNEUROSCI.0170-25.2025.

Computational markers show specific deficits for dyslexia and ADHD in complex learning settings.计算标记显示在复杂学习环境中阅读障碍和注意力缺陷多动障碍存在特定缺陷。

NPJ Sci Learn. 2025 Jun 13;10(1):38. doi: 10.1038/s41539-025-00323-4.

Examining the Relationship Between Early Experience, Selective Attention, and the Formation of Learning Traps.探究早期经历、选择性注意与学习陷阱形成之间的关系。

Cogn Sci. 2025 May;49(5):e70070. doi: 10.1111/cogs.70070.

Reinforcement learning increasingly relates to memory specificity from childhood to adulthood.从童年到成年，强化学习与记忆特异性的关联日益紧密。

Nat Commun. 2025 Apr 30;16(1):4074. doi: 10.1038/s41467-025-59379-w.

Humans learn generalizable representations through efficient coding.人类通过高效编码学习可泛化的表征。

Nat Commun. 2025 Apr 29;16(1):3989. doi: 10.1038/s41467-025-58848-6.

Free recall is shaped by inference and scaffolded by event structure.自由回忆受推理影响，并以事件结构为支撑。

Commun Psychol. 2025 Apr 26;3(1):71. doi: 10.1038/s44271-025-00243-4.

Bridging Species Differences in Rule Switching: How Humans and Monkeys Solve the Same Wisconsin Card Sorting Task.弥合规则转换中的物种差异：人类和猴子如何解决相同的威斯康星卡片分类任务。

J Neurosci. 2025 Apr 16;45(16):e2288242025. doi: 10.1523/JNEUROSCI.2288-24.2025.

The attentional boost effect: current landscape and future directions.注意力增强效应：当前态势与未来方向。

Cogn Process. 2025 Mar 14. doi: 10.1007/s10339-025-01266-9.

本文引用的文献

Explaining compound generalization in associative and causal learning through rational principles of dimensional generalization.通过维度泛化的理性原则解释联想学习和因果学习中的复合泛化。

Psychol Rev. 2014 Jul;121(3):526-58. doi: 10.1037/a0037018.

Neural mechanisms of object-based attention.基于对象的注意力的神经机制。

Science. 2014 Apr 25;344(6182):424-7. doi: 10.1126/science.1247003. Epub 2014 Apr 10.

Autonomous mechanism of internal choice estimate underlies decision inertia.自主的内部选择估计机制是决策惯性的基础。

Neuron. 2014 Jan 8;81(1):195-206. doi: 10.1016/j.neuron.2013.10.018. Epub 2013 Dec 12.

Learning to classify integral-dimension stimuli.学习分类整维度刺激。

Psychon Bull Rev. 1996 Jun;3(2):222-6. doi: 10.3758/BF03212422.

Decoding the brain's algorithm for categorization from its neural implementation.从神经实现中解码大脑的分类算法。

Curr Biol. 2013 Oct 21;23(20):2023-7. doi: 10.1016/j.cub.2013.08.035. Epub 2013 Oct 3.

High-order feature-based mixture models of classification learning predict individual learning curves and enable personalized teaching.基于高阶特征的分类学习混合模型可预测个体学习曲线，并实现个性化教学。

Proc Natl Acad Sci U S A. 2013 Jan 8;110(2):684-9. doi: 10.1073/pnas.1211606110. Epub 2012 Dec 26.

Serotonin selectively modulates reward value in human decision-making.血清素选择性调节人类决策中的奖励价值。

J Neurosci. 2012 Apr 25;32(17):5833-42. doi: 10.1523/JNEUROSCI.0053-12.2012.

Inferring relevance in a changing world.推断变化世界中的相关性。

Front Hum Neurosci. 2012 Jan 24;5:189. doi: 10.3389/fnhum.2011.00189. eCollection 2011.

Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain.神经预测误差揭示了人类大脑中风险敏感的强化学习过程。

J Neurosci. 2012 Jan 11;32(2):551-62. doi: 10.1523/JNEUROSCI.5498-10.2012.

Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis.皮质纹状体电路中分层强化学习的机制 1：计算分析。

Cereb Cortex. 2012 Mar;22(3):509-26. doi: 10.1093/cercor/bhr114. Epub 2011 Jun 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验