Helen Wills Neuroscience Institute, University of California, Berkeley, United States of America.
Helen Wills Neuroscience Institute, University of California, Berkeley, United States of America; Department of Psychology, University of California, Berkeley, United States of America.
Cognition. 2025 Jan;254:105967. doi: 10.1016/j.cognition.2024.105967. Epub 2024 Oct 4.
Learning structures that effectively abstract decision policies is key to the flexibility of human intelligence. Previous work has shown that humans use hierarchically structured policies to efficiently navigate complex and dynamic environments. However, the computational processes that support the learning and construction of such policies remain insufficiently understood. To address this question, we tested 1026 human participants, who made over 1 million choices combined, in a decision-making task where they could learn, transfer, and recompose multiple sets of hierarchical policies. We propose a novel algorithmic account for the learning processes underlying observed human behavior. We show that humans rely on compressed policies over states in early learning, which gradually unfold into hierarchical representations via meta-learning and Bayesian inference. Our modeling evidence suggests that these hierarchical policies are structured in a temporally backward, rather than forward, fashion. Taken together, these algorithmic architectures characterize how the interplay between reinforcement learning, policy compression, meta-learning, and working memory supports structured decision-making and compositionality in a resource-rational way.
学习能够有效抽象决策策略的结构是人类智能灵活性的关键。先前的研究表明,人类使用层次化的策略来有效地在复杂和动态的环境中导航。然而,支持学习和构建此类策略的计算过程仍然理解不足。为了解决这个问题,我们测试了 1026 名参与者,他们总共做出了超过 100 万次选择,在一个决策任务中,他们可以学习、转移和组合多组层次化策略。我们提出了一种新的算法来解释观察到的人类行为背后的学习过程。我们表明,人类在早期学习中依赖于基于状态的压缩策略,这些策略通过元学习和贝叶斯推理逐渐展开为层次化的表示。我们的建模证据表明,这些层次化的策略是以时间上向后而不是向前的方式构建的。总之,这些算法架构描述了强化学习、策略压缩、元学习和工作记忆之间的相互作用如何以资源合理的方式支持结构化决策和组合性。