Nakamura Yutaka, Mori Takeshi, Sato Masa-aki, Ishii Shin
Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan.
Neural Netw. 2007 Aug;20(6):723-35. doi: 10.1016/j.neunet.2007.01.002. Epub 2007 Feb 20.
Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the "CPG-actor-critic" method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.
动物的节律性运动,如移动,被认为是由称为中枢模式发生器(CPG)的神经回路控制的,这些神经回路会产生振荡信号。受这种生物学机制的启发,人们对由CPG控制的节律性运动进行了研究。作为一种用于CPG控制器的自主学习框架,我们在本文中提出了一种强化学习方法,我们称之为“CPG-actor-critic”方法。该方法为行为体引入了一种新架构,其训练大致基于最近提出的一种随机策略梯度算法。我们将此方法应用于两足机器人控制的自动获取问题。计算机模拟表明,我们的方法可以成功地对CPG进行训练,从而使两足机器人不仅能够稳定行走,还能适应环境变化。