Bing Zhenshan, Brucker Matthias, Morin Fabrice O, Li Rui, Su Xiaojie, Huang Kai, Knoll Alois
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7863-7876. doi: 10.1109/TNNLS.2021.3088947. Epub 2022 Nov 30.
Reinforcement learning algorithms, such as hindsight experience replay (HER) and hindsight goal generation (HGG), have been able to solve challenging robotic manipulation tasks in multigoal settings with sparse rewards. HER achieves its training success through hindsight replays of past experience with heuristic goals but underperforms in challenging tasks in which goals are difficult to explore. HGG enhances HER by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. This guided exploration makes HGG applicable to tasks in which target goals are far away from the object's initial position. However, the vanilla HGG is not applicable to manipulation tasks with obstacles because the Euclidean metric used for HGG is not an accurate distance metric in such an environment. Although, with the guidance of a handcrafted distance grid, grid-based HGG can solve manipulation tasks with obstacles, a more feasible method that can solve such tasks automatically is still in demand. In this article, we propose graph-based hindsight goal generation (G-HGG), an extension of HGG selecting hindsight goals based on shortest distances in an obstacle-avoiding graph, which is a discrete representation of the environment. We evaluated G-HGG on four challenging manipulation tasks with obstacles, where significant enhancements in both sample efficiency and overall success rate are shown over HGG and HER. Videos can be viewed at https://videoviewsite.wixsite.com/ghgg.
强化学习算法,如indsight经验回放(HER)和indsight目标生成(HGG),已经能够在稀疏奖励的多目标设置中解决具有挑战性的机器人操作任务。HER通过对具有启发式目标的过去经验进行indsight回放来实现训练成功,但在目标难以探索的具有挑战性的任务中表现不佳。HGG通过选择在短期内易于实现且有望在长期内导向目标目标的中间目标来增强HER。这种有指导的探索使得HGG适用于目标目标远离物体初始位置的任务。然而,普通的HGG不适用于有障碍物的操作任务,因为用于HGG的欧几里得度量在这样的环境中不是准确的距离度量。虽然在手工制作的距离网格的指导下,基于网格的HGG可以解决有障碍物的操作任务,但仍然需要一种更可行的能够自动解决此类任务的方法。在本文中,我们提出了基于图的indsight目标生成(G-HGG),它是HGG的扩展,基于避障图中的最短距离选择indsight目标,避障图是环境的离散表示。我们在四个具有挑战性的有障碍物的操作任务上评估了G-HGG,结果表明它在样本效率和总体成功率方面都比HGG和HER有显著提高。视频可在https://videoviewsite.wixsite.com/ghgg观看。