Matsumoto Takazumi, Tani Jun
Okinawa Institute of Science and Technology, Okinawa 904-0495, Japan.
Entropy (Basel). 2020 May 18;22(5):564. doi: 10.3390/e22050564.
It is crucial to ask how agents can achieve goals by generating action plans using only partial models of the world acquired through habituated sensory-motor experiences. Although many existing robotics studies use a forward model framework, there are generalization issues with high degrees of freedom. The current study shows that the predictive coding (PC) and active inference (AIF) frameworks, which employ a generative model, can develop better generalization by learning a prior distribution in a low dimensional latent state space representing probabilistic structures extracted from well habituated sensory-motor trajectories. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound, while goal-directed planning is accomplished by inferring latent variables for maximizing the estimated lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data by setting an intermediate value for a regularization coefficient. Furthermore, comparative simulation results show that the proposed model outperforms a conventional forward model in goal-directed planning, due to the learned prior confining the search of motor plans within the range of habituated trajectories.
关键在于询问智能体如何通过仅使用通过习惯性感觉运动经验获得的世界局部模型来生成行动计划,从而实现目标。尽管许多现有的机器人研究使用前向模型框架,但存在高自由度的泛化问题。当前研究表明,采用生成模型的预测编码(PC)和主动推理(AIF)框架可以通过在低维潜在状态空间中学习先验分布来实现更好的泛化,该空间表示从习惯性感觉运动轨迹中提取的概率结构。在我们提出的模型中,学习是通过推断最优潜在变量以及突触权重来最大化证据下界来进行的,而目标导向规划则是通过推断潜在变量来最大化估计下界来完成的。我们提出的模型在模拟中用简单和复杂的机器人任务进行了评估,通过为正则化系数设置中间值,证明了在有限训练数据的学习中具有足够的泛化能力。此外,比较模拟结果表明,由于学习到的先验将运动计划的搜索限制在习惯性轨迹范围内,所提出的模型在目标导向规划方面优于传统的前向模型。