基于聚类的失败目标感知事后经验回放

Clustering-based Failed goal Aware Hindsight Experience Replay.

作者信息

Kim Taeyoung, Kang Taemin, Jeong Haechan, Har Dongsoo

机构信息

CCS Graduate School of Mobility, Korea Advanced Institute of Science & Technology, Daejeon, Republic of South Korea.

The Robotics Program, Korea Advanced Institute of Science & Technology, Daejeon, Republic of South Korea.

出版信息

PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.

DOI:10.7717/peerj-cs.2588

PMID:39896403

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784800/

Abstract

In a multi-goal reinforcement learning environment, an agent learns a policy to perform tasks with multiple goals from experiences gained through exploration. In environments with sparse binary rewards, the replay buffer contains few successful experiences, posing a challenge for sampling efficiency. To address this, Hindsight Experience Replay (HER) generates successful experiences, named hindsight experiences, from unsuccessful ones. However, uniform sampling of experiences for the process of HER can lead to inefficient scenarios of generating hindsight experience. In this paper, a novel method called Failed goal Aware HER (FAHER) is proposed to enhance sampling efficiency. This method considers the properties of achieved goals with respect to failed goals during sampling. To account for these properties, a cluster model is used to cluster episodes in the replay buffer, and experiences are subsequently sampled in the manner of HER. The proposed method is validated through experiments on three robotic control tasks from the OpenAI Gym. The experimental results demonstrate that the proposed method is more sample-efficient and achieves improved performance over baseline approaches.

摘要

在多目标强化学习环境中，智能体从探索中获得的经验里学习执行多目标任务的策略。在具有稀疏二元奖励的环境中，重放缓冲区包含的成功经验很少，这对采样效率构成了挑战。为了解决这个问题，indsight经验重放（HER）从不成功的经验中生成成功经验，即indsight经验。然而，在HER过程中对经验进行均匀采样可能会导致生成indsight经验的低效情况。本文提出了一种名为失败目标感知HER（FAHER）的新方法来提高采样效率。该方法在采样时考虑已实现目标相对于失败目标的属性。为了考虑这些属性，使用聚类模型对重放缓冲区中的情节进行聚类，随后按照HER的方式对经验进行采样。通过在OpenAI Gym的三个机器人控制任务上进行实验，对所提出的方法进行了验证。实验结果表明，所提出的方法具有更高的采样效率，并且与基线方法相比性能有所提高。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于聚类的失败目标感知事后经验回放

Clustering-based Failed goal Aware Hindsight Experience Replay.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

基于聚类的失败目标感知事后经验回放

Clustering-based Failed goal Aware Hindsight Experience Replay.

作者信息

机构信息

出版信息

相似文献

本文引用的文献