Department of Computer Science, The University of Texas at Austin, Austin, TX, USA.
Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA.
PLoS Comput Biol. 2018 Oct 25;14(10):e1006518. doi: 10.1371/journal.pcbi.1006518. eCollection 2018 Oct.
Although a standard reinforcement learning model can capture many aspects of reward-seeking behaviors, it may not be practical for modeling human natural behaviors because of the richness of dynamic environments and limitations in cognitive resources. We propose a modular reinforcement learning model that addresses these factors. Based on this model, a modular inverse reinforcement learning algorithm is developed to estimate both the rewards and discount factors from human behavioral data, which allows predictions of human navigation behaviors in virtual reality with high accuracy across different subjects and with different tasks. Complex human navigation trajectories in novel environments can be reproduced by an artificial agent that is based on the modular model. This model provides a strategy for estimating the subjective value of actions and how they influence sensory-motor decisions in natural behavior.
虽然标准的强化学习模型可以捕捉到许多寻求奖励的行为方面,但由于动态环境的丰富性和认知资源的限制,它可能不适用于建模人类的自然行为。我们提出了一种模块化的强化学习模型来解决这些因素。基于这个模型,开发了一种模块化的逆强化学习算法,从人类行为数据中估计奖励和折扣因素,这使得可以在不同的主体和不同的任务中高精度地预测人类在虚拟现实中的导航行为。基于模块化模型的人工代理可以再现新环境中复杂的人类导航轨迹。该模型提供了一种估计动作的主观价值以及它们如何影响自然行为中感觉运动决策的策略。