Kim Taeyoung, Kang Taemin, Jeong Haechan, Har Dongsoo
CCS Graduate School of Mobility, Korea Advanced Institute of Science & Technology, Daejeon, Republic of South Korea.
The Robotics Program, Korea Advanced Institute of Science & Technology, Daejeon, Republic of South Korea.
PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.
In a multi-goal reinforcement learning environment, an agent learns a policy to perform tasks with multiple goals from experiences gained through exploration. In environments with sparse binary rewards, the replay buffer contains few successful experiences, posing a challenge for sampling efficiency. To address this, Hindsight Experience Replay (HER) generates successful experiences, named hindsight experiences, from unsuccessful ones. However, uniform sampling of experiences for the process of HER can lead to inefficient scenarios of generating hindsight experience. In this paper, a novel method called Failed goal Aware HER (FAHER) is proposed to enhance sampling efficiency. This method considers the properties of achieved goals with respect to failed goals during sampling. To account for these properties, a cluster model is used to cluster episodes in the replay buffer, and experiences are subsequently sampled in the manner of HER. The proposed method is validated through experiments on three robotic control tasks from the OpenAI Gym. The experimental results demonstrate that the proposed method is more sample-efficient and achieves improved performance over baseline approaches.
在多目标强化学习环境中,智能体从探索中获得的经验里学习执行多目标任务的策略。在具有稀疏二元奖励的环境中,重放缓冲区包含的成功经验很少,这对采样效率构成了挑战。为了解决这个问题,indsight经验重放(HER)从不成功的经验中生成成功经验,即indsight经验。然而,在HER过程中对经验进行均匀采样可能会导致生成indsight经验的低效情况。本文提出了一种名为失败目标感知HER(FAHER)的新方法来提高采样效率。该方法在采样时考虑已实现目标相对于失败目标的属性。为了考虑这些属性,使用聚类模型对重放缓冲区中的情节进行聚类,随后按照HER的方式对经验进行采样。通过在OpenAI Gym的三个机器人控制任务上进行实验,对所提出的方法进行了验证。实验结果表明,所提出的方法具有更高的采样效率,并且与基线方法相比性能有所提高。