Fu Jian, Teng Xiang, Cao Ce, Ju Zhaojie, Lou Ping
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4553-4564. doi: 10.1109/TNNLS.2020.3021530. Epub 2021 Oct 5.
Recent research achievements in learning from demonstration (LfD) illustrate that the reinforcement learning is effective for the robots to improve their movement skills. The current challenge mainly remains in how to generate new robot motions automatically to perform new tasks, which have a similar preassigned performance indicator but are different from the demonstration tasks. To deal with the abovementioned issue, this article proposes a framework to represent the policy and conduct imitation learning and optimization for robot intelligent trajectory planning, based on the improved locally weighted regression (iLWR) and policy improvement with path integral by dual perturbation (PI-DP). Besides, the reward-guided weight searching and basis function's adaptive evolving are performed alternately in two spaces, i.e., the basis function space and the weight space, to deal with the abovementioned problem. The alternate learning process constructs a sequence of two-tuples that join the demonstration task and new one together for motor skill transfer, so that the robot gradually acquires motor skill, from the task similar to demonstration to dissimilar tasks with different performance metrics. Classical via-points trajectory planning experiments are performed with the SCARA manipulator, a 10-degree of freedom (DOF) planar, and the UR robot. These results show that the proposed method is not only feasible but also effective.
近期从示范中学习(LfD)的研究成果表明,强化学习对于机器人提高其运动技能是有效的。当前的挑战主要仍在于如何自动生成新的机器人运动以执行新任务,这些新任务具有类似的预先指定的性能指标,但与示范任务不同。为解决上述问题,本文提出了一个框架,用于基于改进的局部加权回归(iLWR)和通过双重扰动的路径积分进行策略改进(PI-DP)来表示策略并对机器人智能轨迹规划进行模仿学习和优化。此外,奖励引导的权重搜索和基函数的自适应演化在两个空间中交替进行,即基函数空间和权重空间,以处理上述问题。交替学习过程构建了一系列二元组,将示范任务和新任务连接在一起以进行运动技能转移,从而使机器人逐渐获得运动技能,从类似于示范的任务到具有不同性能指标的不相似任务。使用SCARA机械手、10自由度(DOF)平面机器人和UR机器人进行了经典的通过点轨迹规划实验。这些结果表明所提出的方法不仅可行而且有效。