分层组织行为及其神经基础：强化学习视角

Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective.

作者信息

Botvinick Matthew M, Niv Yael, Barto Andew G

机构信息

Princeton Neuroscience Institute, Department of Psychology, Princeton University, Green Hall, Princeton, NJ 08540, United States.

出版信息

Cognition. 2009 Dec;113(3):262-280. doi: 10.1016/j.cognition.2008.08.011. Epub 2008 Oct 15.

DOI:10.1016/j.cognition.2008.08.011

PMID:18926527

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2783353/

Abstract

Research on human and animal behavior has long emphasized its hierarchical structure-the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior.

摘要

对人类和动物行为的研究长期以来一直强调其层次结构——将正在进行的行为划分为离散任务，这些任务由子任务序列组成，而子任务序列又由简单动作构建而成。行为的层次结构在神经科学领域也一直备受关注，在该领域它被广泛认为反映了前额叶皮质的功能。在本文中，我们从计算强化学习的最新进展角度重新审视行为层次结构及其神经基础。具体而言，我们考虑一组统称为分层强化学习的方法，这些方法通过允许学习智能体将动作聚合为可重复使用的子例程或技能来扩展强化学习范式。仔细研究分层强化学习的组成部分，能揭示它们如何映射到神经结构上，特别是背外侧和眶额前额叶皮质内的区域。这也提示了分层强化学习可能以特定方式对现有的层次结构化行为心理模型起到补充作用。分层强化学习凸显的一个特别重要的问题是，学习如何识别新的动作例程，这些例程可能为解决未来广泛问题提供有用的构建模块。在这一点以及许多其他方面，分层强化学习为研究层次结构化行为的计算和神经基础提供了一个有吸引力的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5863/2783353/2b1db9e7ea40/nihms141957f1.jpg

相似文献

Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective.分层组织行为及其神经基础：强化学习视角

Cognition. 2009 Dec;113(3):262-280. doi: 10.1016/j.cognition.2008.08.011. Epub 2008 Oct 15.

Developing PFC representations using reinforcement learning.使用强化学习开发前额叶皮层表征。

Cognition. 2009 Dec;113(3):281-292. doi: 10.1016/j.cognition.2009.05.015. Epub 2009 Jul 9.

Hierarchical reinforcement learning and decision making.分层强化学习与决策。

Curr Opin Neurobiol. 2012 Dec;22(6):956-62. doi: 10.1016/j.conb.2012.05.008. Epub 2012 Jun 11.

Navigating complex decision spaces: Problems and paradigms in sequential choice.导航复杂决策空间：序列选择中的问题和范式。

Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.

Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI.皮质纹状体回路中层次强化学习的机制 2：来自 fMRI 的证据。

Cereb Cortex. 2012 Mar;22(3):527-36. doi: 10.1093/cercor/bhr117. Epub 2011 Jun 21.

Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis.皮质纹状体电路中分层强化学习的机制 1：计算分析。

Cereb Cortex. 2012 Mar;22(3):509-26. doi: 10.1093/cercor/bhr114. Epub 2011 Jun 21.

Optimizing agent behavior over long time scales by transporting value.通过传递价值来优化代理在长时间尺度上的行为。

Nat Commun. 2019 Nov 19;10(1):5223. doi: 10.1038/s41467-019-13073-w.

The prefrontal cortex and hybrid learning during iterative competitive games.前额皮质与迭代竞争游戏中的混合学习。

Ann N Y Acad Sci. 2011 Dec;1239:100-8. doi: 10.1111/j.1749-6632.2011.06223.x.

Computational evidence for hierarchically structured reinforcement learning in humans.人类强化学习的分层结构计算证据。

Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29381-29389. doi: 10.1073/pnas.1912330117.

A neural signature of hierarchical reinforcement learning.分层强化学习的神经特征。

Neuron. 2011 Jul 28;71(2):370-9. doi: 10.1016/j.neuron.2011.05.042.

引用本文的文献

Goals and the Structure of Experience.目标与经验的结构

ArXiv. 2025 Aug 20:arXiv:2508.15013v1.

Cognitively-plausible reinforcement learning in epidemiological agent-based simulations.基于代理的流行病学模拟中认知合理的强化学习

Front Epidemiol. 2025 Jul 28;5:1563731. doi: 10.3389/fepid.2025.1563731. eCollection 2025.

ACC representations of reward-driven motivation over hierarchically-organized behavior.奖励驱动动机在分层组织行为上的ACC表征。

Neuroimage. 2025 Aug 15;317:121380. doi: 10.1016/j.neuroimage.2025.121380. Epub 2025 Jul 17.

The temporal dynamics of metacognitive experiences track rational adaptations in task performance.元认知体验的时间动态追踪任务表现中的理性适应。

Commun Psychol. 2025 Jul 3;3(1):96. doi: 10.1038/s44271-025-00282-x.

The role of affective states in computational psychiatry.情感状态在计算精神病学中的作用。

Int J Neuropsychopharmacol. 2025 Aug 1;28(8). doi: 10.1093/ijnp/pyaf049.

A unified neural representation model for spatial and conceptual computations.一种用于空间和概念计算的统一神经表征模型。

Proc Natl Acad Sci U S A. 2025 Mar 18;122(11):e2413449122. doi: 10.1073/pnas.2413449122. Epub 2025 Mar 10.

The devilish details affecting TDRL models in dopamine research.多巴胺研究中影响临时残疾评定量表（TDRL）模型的棘手细节。

Trends Cogn Sci. 2025 May;29(5):434-447. doi: 10.1016/j.tics.2025.02.001. Epub 2025 Feb 26.

Naturally disengaging control to reveal habits.自然地解除控制以揭示习惯。

Res Sq. 2025 Jan 20:rs.3.rs-5773028. doi: 10.21203/rs.3.rs-5773028/v1.

Disentangling the Component Processes in Complex Planning Impairments Following Ventromedial Prefrontal Lesions.解析腹内侧前额叶损伤后复杂规划障碍中的组成过程

J Neurosci. 2025 Mar 19;45(12):e1814242025. doi: 10.1523/JNEUROSCI.1814-24.2025.

Expert navigators deploy rational complexity-based decision precaching for large-scale real-world planning.专业导航器为大规模实际规划部署基于合理复杂性的决策预缓存。

Proc Natl Acad Sci U S A. 2025 Jan 28;122(4):e2407814122. doi: 10.1073/pnas.2407814122. Epub 2025 Jan 23.

本文引用的文献

Understanding the Intentions of Others: Re-Enactment of Intended Acts by 18-Month-Old Children.理解他人意图：18个月大婴儿对意图行为的重新 enact（此处可能有误，推测为“再现”）。

Dev Psychol. 1995 Sep;31(5):838-850. doi: 10.1037/0012-1649.31.5.838.

Contention scheduling and the control of routine activities.争用调度和常规活动的控制。

Cogn Neuropsychol. 2000 Jun 1;17(4):297-338. doi: 10.1080/026432900380427.

Hierarchical models of behavior and prefrontal function.行为与前额叶功能的层次模型。

Trends Cogn Sci. 2008 May;12(5):201-8. doi: 10.1016/j.tics.2008.02.009. Epub 2008 Apr 15.

Cognitive control, hierarchy, and the rostro-caudal organization of the frontal lobes.认知控制、层级结构与额叶的前后组织

Trends Cogn Sci. 2008 May;12(5):193-200. doi: 10.1016/j.tics.2008.02.004. Epub 2008 Apr 9.

Action outcomes are represented in human inferior frontoparietal cortex.动作结果在人类额顶叶下部皮层中得到体现。

Cereb Cortex. 2008 May;18(5):1160-8. doi: 10.1093/cercor/bhm150. Epub 2007 Aug 28.

Multilevel structure in behaviour and in the brain: a model of Fuster's hierarchy.行为与大脑中的多层次结构：富斯特层级模型

Philos Trans R Soc Lond B Biol Sci. 2007 Sep 29;362(1485):1615-26. doi: 10.1098/rstb.2007.2056.

From cognitive to neural models of working memory.从工作记忆的认知模型到神经模型。

Philos Trans R Soc Lond B Biol Sci. 2007 May 29;362(1481):761-72. doi: 10.1098/rstb.2007.2086.

Event perception: a mind-brain perspective.事件感知：一种心智-大脑视角

Psychol Bull. 2007 Mar;133(2):273-93. doi: 10.1037/0033-2909.133.2.273.

Prefrontal neural correlates of memory for sequences.序列记忆的前额叶神经关联

J Neurosci. 2007 Feb 28;27(9):2204-11. doi: 10.1523/JNEUROSCI.4483-06.2007.

Categorization of behavioural sequences in the prefrontal cortex.前额叶皮质中行为序列的分类

Nature. 2007 Jan 18;445(7125):315-8. doi: 10.1038/nature05470. Epub 2006 Dec 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验