Han Changlin, Peng Zhiyong, Liu Yadong, Tang Jingsheng, Yu Yang, Zhou Zongtan
Department of Intelligence Science and Technology, College of Intelligence Science, National University of Defense Technology, Changsha, China.
Front Neurorobot. 2023 Mar 7;17:1089270. doi: 10.3389/fnbot.2023.1089270. eCollection 2023.
Reinforcement learning (RL) empowers the agent to learn robotic manipulation skills autonomously. Compared with traditional single-goal RL, semantic-goal-conditioned RL expands the agent capacity to accomplish multiple semantic manipulation instructions. However, due to sparsely distributed semantic goals and sparse-reward agent-environment interactions, the hard exploration problem arises and impedes the agent training process. In traditional RL, curiosity-motivated exploration shows effectiveness in solving the hard exploration problem. However, in semantic-goal-conditioned RL, the performance of previous curiosity-motivated methods deteriorates, which we propose is because of their two defects: uncontrollability and distraction. To solve these defects, we propose a conservative curiosity-motivated method named mutual information motivation with hybrid policy mechanism (MIHM). MIHM mainly contributes two innovations: the decoupled-mutual-information-based intrinsic motivation, which prevents the agent from being motivated to explore dangerous states by uncontrollable curiosity; the precisely trained and automatically switched hybrid policy mechanism, which eliminates the distraction from the curiosity-motivated policy and achieves the optimal utilization of exploration and exploitation. Compared with four state-of-the-art curiosity-motivated methods in the sparse-reward robotic manipulation task with 35 valid semantic goals, including stacks of 2 or 3 objects and pyramids, our MIHM shows the fastest learning speed. Moreover, MIHM achieves the highest 0.9 total success rate, which is up to 0.6 in other methods. Throughout all the baseline methods, our MIHM is the only one that achieves to stack three objects.
强化学习(RL)使智能体能够自主学习机器人操作技能。与传统的单目标强化学习相比,语义目标条件强化学习扩展了智能体完成多个语义操作指令的能力。然而,由于语义目标分布稀疏以及智能体与环境的交互奖励稀疏,出现了困难探索问题,阻碍了智能体的训练过程。在传统强化学习中,基于好奇心的探索在解决困难探索问题方面显示出有效性。然而,在语义目标条件强化学习中,先前基于好奇心的方法性能下降,我们认为这是由于它们的两个缺陷:不可控性和干扰性。为了解决这些缺陷,我们提出了一种保守的基于好奇心的方法,即具有混合策略机制的互信息激励(MIHM)。MIHM主要有两项创新:基于解耦互信息的内在激励,可防止智能体因不可控的好奇心而去探索危险状态;经过精确训练并能自动切换的混合策略机制,可消除基于好奇心的策略带来的干扰,并实现探索与利用的最优利用。在具有35个有效语义目标(包括2个或3个物体的堆叠以及金字塔)的稀疏奖励机器人操作任务中,与四种最先进的基于好奇心的方法相比,我们的MIHM显示出最快的学习速度。此外,MIHM实现了最高0.9的总成功率,而其他方法最高为0.6。在所有基线方法中,我们的MIHM是唯一能够堆叠三个物体的方法。