Li Tzuu-Hseng S, Su Yu-Te, Lai Shao-Wei, Hu Jhen-Jia
Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan.
IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):736-48. doi: 10.1109/TSMCB.2010.2089978. Epub 2010 Nov 18.
This paper proposes the implementation of fuzzy motion control based on reinforcement learning (RL) and Lagrange polynomial interpolation (LPI) for gait synthesis of biped robots. First, the procedure of a walking gait is redefined into three states, and the parameters of this designed walking gait are determined. Then, the machine learning approach applied to adjusting the walking parameters is policy gradient RL (PGRL), which can execute real-time performance and directly modify the policy without calculating the dynamic function. Given a parameterized walking motion designed for biped robots, the PGRL algorithm automatically searches the set of possible parameters and finds the fastest possible walking motion. The reward function mainly considered is first the walking speed, which can be estimated from the vision system. However, the experiment illustrates that there are some stability problems in this kind of learning process. To solve these problems, the desired zero moment point trajectory is added to the reward function. The results show that the robot not only has more stable walking but also increases its walking speed after learning. This is more effective and attractive than manual trial-and-error tuning. LPI, moreover, is employed to transform the existing motions to the motion which has a revised angle determined by the fuzzy motion controller. Then, the biped robot can continuously walk in any desired direction through this fuzzy motion control. Finally, the fuzzy-based gait synthesis control is demonstrated by tasks and point- and line-target tracking. The experiments show the feasibility and effectiveness of gait learning with PGRL and the practicability of the proposed fuzzy motion control scheme.
本文提出了一种基于强化学习(RL)和拉格朗日多项式插值(LPI)的模糊运动控制方法,用于两足机器人的步态合成。首先,将行走步态的过程重新定义为三种状态,并确定所设计行走步态的参数。然后,应用于调整行走参数的机器学习方法是策略梯度强化学习(PGRL),它可以执行实时性能并直接修改策略,而无需计算动态函数。给定为两足机器人设计的参数化行走运动,PGRL算法会自动搜索可能的参数集,并找到最快的行走运动。主要考虑的奖励函数首先是行走速度,它可以从视觉系统中估计出来。然而,实验表明这种学习过程存在一些稳定性问题。为了解决这些问题,将期望的零力矩点轨迹添加到奖励函数中。结果表明,机器人在学习后不仅行走更稳定,而且行走速度也有所提高。这比手动试错调整更有效且更具吸引力。此外,采用LPI将现有运动转换为由模糊运动控制器确定的具有修正角度的运动。然后,两足机器人可以通过这种模糊运动控制在任何期望的方向上连续行走。最后,通过任务以及点和线目标跟踪演示了基于模糊的步态合成控制。实验表明了使用PGRL进行步态学习的可行性和有效性,以及所提出的模糊运动控制方案的实用性。