一种算法解释了人类如何高效地学习、转移和组合具有层次结构的决策策略。

An algorithmic account for how humans efficiently learn, transfer, and compose hierarchically structured decision policies.

机构信息

Helen Wills Neuroscience Institute, University of California, Berkeley, United States of America.

Helen Wills Neuroscience Institute, University of California, Berkeley, United States of America; Department of Psychology, University of California, Berkeley, United States of America.

出版信息

Cognition. 2025 Jan;254:105967. doi: 10.1016/j.cognition.2024.105967. Epub 2024 Oct 4.

DOI:10.1016/j.cognition.2024.105967

PMID:39368350

Abstract

Learning structures that effectively abstract decision policies is key to the flexibility of human intelligence. Previous work has shown that humans use hierarchically structured policies to efficiently navigate complex and dynamic environments. However, the computational processes that support the learning and construction of such policies remain insufficiently understood. To address this question, we tested 1026 human participants, who made over 1 million choices combined, in a decision-making task where they could learn, transfer, and recompose multiple sets of hierarchical policies. We propose a novel algorithmic account for the learning processes underlying observed human behavior. We show that humans rely on compressed policies over states in early learning, which gradually unfold into hierarchical representations via meta-learning and Bayesian inference. Our modeling evidence suggests that these hierarchical policies are structured in a temporally backward, rather than forward, fashion. Taken together, these algorithmic architectures characterize how the interplay between reinforcement learning, policy compression, meta-learning, and working memory supports structured decision-making and compositionality in a resource-rational way.

摘要

学习能够有效抽象决策策略的结构是人类智能灵活性的关键。先前的研究表明，人类使用层次化的策略来有效地在复杂和动态的环境中导航。然而，支持学习和构建此类策略的计算过程仍然理解不足。为了解决这个问题，我们测试了 1026 名参与者，他们总共做出了超过 100 万次选择，在一个决策任务中，他们可以学习、转移和组合多组层次化策略。我们提出了一种新的算法来解释观察到的人类行为背后的学习过程。我们表明，人类在早期学习中依赖于基于状态的压缩策略，这些策略通过元学习和贝叶斯推理逐渐展开为层次化的表示。我们的建模证据表明，这些层次化的策略是以时间上向后而不是向前的方式构建的。总之，这些算法架构描述了强化学习、策略压缩、元学习和工作记忆之间的相互作用如何以资源合理的方式支持结构化决策和组合性。

相似文献

An algorithmic account for how humans efficiently learn, transfer, and compose hierarchically structured decision policies.一种算法解释了人类如何高效地学习、转移和组合具有层次结构的决策策略。

Cognition. 2025 Jan;254:105967. doi: 10.1016/j.cognition.2024.105967. Epub 2024 Oct 4.

Dissociable Neural Systems Support the Learning and Transfer of Hierarchical Control Structure.可分离的神经系统支持分层控制结构的学习和转移。

J Neurosci. 2020 Aug 19;40(34):6624-6637. doi: 10.1523/JNEUROSCI.0847-20.2020. Epub 2020 Jul 20.

Computational evidence for hierarchically structured reinforcement learning in humans.人类强化学习的分层结构计算证据。

Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29381-29389. doi: 10.1073/pnas.1912330117.

Intact Reinforcement Learning But Impaired Attentional Control During Multidimensional Probabilistic Learning in Older Adults.老年人在多维概率学习中表现出完整的强化学习能力但注意力控制受损。

J Neurosci. 2020 Jan 29;40(5):1084-1096. doi: 10.1523/JNEUROSCI.0254-19.2019. Epub 2019 Dec 11.

Multi-task reinforcement learning in humans.人类的多任务强化学习。

Nat Hum Behav. 2021 Jun;5(6):764-773. doi: 10.1038/s41562-020-01035-y. Epub 2021 Jan 28.

Temporal and state abstractions for efficient learning, transfer, and composition in humans.人类高效学习、迁移和组合的时间和状态抽象。

Psychol Rev. 2021 Jul;128(4):643-666. doi: 10.1037/rev0000295. Epub 2021 May 20.

Novelty and Inductive Generalization in Human Reinforcement Learning.人类强化学习中的新颖性与归纳概括

Top Cogn Sci. 2015 Jul;7(3):391-415. doi: 10.1111/tops.12138. Epub 2015 Mar 23.

Learning and transfer of working memory gating policies.工作记忆门控策略的学习和迁移。

Cognition. 2018 Mar;172:89-100. doi: 10.1016/j.cognition.2017.12.001. Epub 2017 Dec 12.

A reinforcement learning diffusion decision model for value-based decisions.基于价值的决策的强化学习扩散决策模型。

Psychon Bull Rev. 2019 Aug;26(4):1099-1121. doi: 10.3758/s13423-018-1554-2.

Decision theory, reinforcement learning, and the brain.决策理论、强化学习与大脑。

Cogn Affect Behav Neurosci. 2008 Dec;8(4):429-53. doi: 10.3758/CABN.8.4.429.

引用本文的文献

Naturally disengaging control to reveal habits.自然地解除控制以揭示习惯。

Res Sq. 2025 Jan 20:rs.3.rs-5773028. doi: 10.21203/rs.3.rs-5773028/v1.

本文引用的文献

Exploring the hierarchical structure of human plans via program generation.通过程序生成探索人类计划的层次结构。

Cognition. 2025 Feb;255:105990. doi: 10.1016/j.cognition.2024.105990. Epub 2024 Nov 30.

Human decision making balances reward maximization and policy compression.人类决策平衡了奖励最大化和策略压缩。

PLoS Comput Biol. 2024 Apr 26;20(4):e1012057. doi: 10.1371/journal.pcbi.1012057. eCollection 2024 Apr.

Naturalistic reinforcement learning.自然强化学习。

Trends Cogn Sci. 2024 Feb;28(2):144-158. doi: 10.1016/j.tics.2023.08.016. Epub 2023 Sep 29.

A goal-centric outlook on learning.以目标为中心的学习观。

Trends Cogn Sci. 2023 Dec;27(12):1150-1164. doi: 10.1016/j.tics.2023.08.011. Epub 2023 Sep 9.

Credit assignment in hierarchical option transfer.分层期权转移中的信用分配

Cogsci. 2022 Jul;44:948-954.

People construct simplified mental representations to plan.人们构建简化的心理表征来进行规划。

Nature. 2022 Jun;606(7912):129-136. doi: 10.1038/s41586-022-04743-9. Epub 2022 May 19.

How Working Memory and Reinforcement Learning Are Intertwined: A Cognitive, Neural, and Computational Perspective.工作记忆与强化学习如何相互交织：认知、神经及计算视角

J Cogn Neurosci. 2022 Mar 5;34(4):551-568. doi: 10.1162/jocn_a_01808.

Temporal and state abstractions for efficient learning, transfer, and composition in humans.人类高效学习、迁移和组合的时间和状态抽象。

Psychol Rev. 2021 Jul;128(4):643-666. doi: 10.1037/rev0000295. Epub 2021 May 20.

Computational evidence for hierarchically structured reinforcement learning in humans.人类强化学习的分层结构计算证据。

Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29381-29389. doi: 10.1073/pnas.1912330117.

Reward-predictive representations generalize across tasks in reinforcement learning.在强化学习中，奖励预测表示可以跨任务泛化。

PLoS Comput Biol. 2020 Oct 15;16(10):e1008317. doi: 10.1371/journal.pcbi.1008317. eCollection 2020 Oct.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种算法解释了人类如何高效地学习、转移和组合具有层次结构的决策策略。

An algorithmic account for how humans efficiently learn, transfer, and compose hierarchically structured decision policies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献