Department of Psychology, University of California, Berkeley, CA 94704.
Department of Psychology, University of California, Berkeley, CA 94704
Proc Natl Acad Sci U S A. 2020 Nov 24;117(47):29381-29389. doi: 10.1073/pnas.1912330117.
Humans have the fascinating ability to achieve goals in a complex and constantly changing world, still surpassing modern machine-learning algorithms in terms of flexibility and learning speed. It is generally accepted that a crucial factor for this ability is the use of abstract, hierarchical representations, which employ structure in the environment to guide learning and decision making. Nevertheless, how we create and use these hierarchical representations is poorly understood. This study presents evidence that human behavior can be characterized as hierarchical reinforcement learning (RL). We designed an experiment to test specific predictions of hierarchical RL using a series of subtasks in the realm of context-based learning and observed several behavioral markers of hierarchical RL, such as asymmetric switch costs between changes in higher-level versus lower-level features, faster learning in higher-valued compared to lower-valued contexts, and preference for higher-valued compared to lower-valued contexts. We replicated these results across three independent samples. We simulated three models-a classic RL, a hierarchical RL, and a hierarchical Bayesian model-and compared their behavior to human results. While the flat RL model captured some aspects of participants' sensitivity to outcome values, and the hierarchical Bayesian model captured some markers of transfer, only hierarchical RL accounted for all patterns observed in human behavior. This work shows that hierarchical RL, a biologically inspired and computationally simple algorithm, can capture human behavior in complex, hierarchical environments and opens the avenue for future research in this field.
人类拥有在复杂且不断变化的世界中实现目标的惊人能力,在灵活性和学习速度方面仍然超过现代机器学习算法。人们普遍认为,这种能力的一个关键因素是使用抽象的、分层的表示形式,利用环境中的结构来指导学习和决策。然而,我们如何创建和使用这些分层表示形式还知之甚少。本研究表明,人类的行为可以被描述为分层强化学习(RL)。我们设计了一个实验,使用基于上下文学习领域中的一系列子任务来测试分层 RL 的具体预测,并观察到了几个分层 RL 的行为标记,例如在高级别与低级别特征变化之间的不对称切换成本、在高价值与低价值环境中更快的学习以及对高价值与低价值环境的偏好。我们在三个独立的样本中复制了这些结果。我们模拟了三个模型——经典 RL、分层 RL 和分层贝叶斯模型,并将它们的行为与人类的结果进行了比较。虽然平面 RL 模型捕捉到了参与者对结果值敏感性的某些方面,而分层贝叶斯模型捕捉到了一些转移的标记,但只有分层 RL 解释了人类行为中观察到的所有模式。这项工作表明,分层 RL,一种受生物启发且计算简单的算法,可以捕捉到人类在复杂分层环境中的行为,并为该领域的未来研究开辟了道路。