基于任务分解和专用奖励系统的抓取与放置强化学习算法

The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place.

作者信息

Kim Byeongjun, Kwon Gunam, Park Chaneun, Kwon Nam Kyu

机构信息

Department of Electronic Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea.

School of Electronics Engineering, Kyungpook National University, Daegu 41566, Republic of Korea.

出版信息

Biomimetics (Basel). 2023 Jun 6;8(2):240. doi: 10.3390/biomimetics8020240.

DOI:10.3390/biomimetics8020240

PMID:37366835

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10296071/

Abstract

This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%.

摘要

本文针对机器人操纵器的高级任务之一——抓取放置任务，提出了一种基于任务分解和专用奖励系统的强化学习算法。所提出的方法将抓取放置任务分解为三个子任务：两个到达任务和一个抓取任务。两个到达任务中的一个是接近物体，另一个是到达放置位置。这两个到达任务是使用通过软演员评论家（SAC）训练的智能体的每个最优策略来执行的。与两个到达任务不同，抓取是通过简单逻辑实现的，这种逻辑易于设计，但可能导致抓取不当。为了正确辅助抓取任务，通过使用基于单个轴的权重设计了一个用于接近物体的专用奖励系统。为了验证所提出方法的有效性，我们在具有Robosuite框架的MuJoCo物理引擎中进行了各种实验。根据四次试验的模拟结果，机器人操纵器在目标位置拿起并释放物体，平均成功率为93.2%。