Department of Computer Science, Kyonggi University, Suwon-si 16227, Korea.
Sensors (Basel). 2021 May 13;21(10):3409. doi: 10.3390/s21103409.
This study proposes a novel hybrid imitation learning (HIL) framework in which behavior cloning (BC) and state cloning (SC) methods are combined in a mutually complementary manner to enhance the efficiency of robotic manipulation task learning. The proposed HIL framework efficiently combines BC and SC losses using an adaptive loss mixing method. It uses pretrained dynamics networks to enhance SC efficiency and performs stochastic state recovery to ensure stable learning of policy networks by transforming the learner's task state into a demo state on the demo task trajectory during SC. The training efficiency and policy flexibility of the proposed HIL framework are demonstrated in a series of experiments conducted to perform major robotic manipulation tasks (pick-up, pick-and-place, and stack tasks). In the experiments, the HIL framework showed about a 2.6 times higher performance improvement than the pure BC and about a four times faster training time than the pure SC imitation learning method. In addition, the HIL framework also showed about a 1.6 times higher performance improvement and about a 2.2 times faster training time than the other hybrid learning method combining BC and reinforcement learning (BC + RL) in the experiments.
本研究提出了一种新颖的混合模仿学习(HIL)框架,该框架将行为克隆(BC)和状态克隆(SC)方法以相互补充的方式结合起来,以提高机器人操作任务学习的效率。所提出的 HIL 框架使用自适应损失混合方法有效地结合了 BC 和 SC 损失。它使用预先训练的动力学网络来提高 SC 的效率,并通过在 SC 期间将学习者的任务状态转换为演示任务轨迹上的演示状态来执行随机状态恢复,从而确保策略网络的稳定学习。通过进行一系列执行主要机器人操作任务(拾取、放置和堆叠任务)的实验,验证了所提出的 HIL 框架的训练效率和策略灵活性。在实验中,HIL 框架的性能比纯 BC 提高了约 2.6 倍,训练时间比纯 SC 模仿学习方法快了约 4 倍。此外,与实验中的其他结合 BC 和强化学习(BC + RL)的混合学习方法相比,HIL 框架的性能也提高了约 1.6 倍,训练时间也快了约 2.2 倍。