Suppr超能文献

基于模仿学习解决机器人鱼姿态调节问题的研究

Leveraging Imitation Learning on Pose Regulation Problem of a Robotic Fish.

作者信息

Zhang Tianhao, Yue Lu, Wang Chen, Sun Jinan, Zhang Shikun, Wei Airong, Xie Guangming

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4232-4245. doi: 10.1109/TNNLS.2022.3202075. Epub 2024 Feb 29.

Abstract

In this article, the pose regulation control problem of a robotic fish is investigated by formulating it as a Markov decision process (MDP). Such a typical task that requires the robot to arrive at the desired position with the desired orientation remains a challenge, since two objectives (position and orientation) may be conflicted during optimization. To handle the challenge, we adopt the sparse reward scheme, i.e., the robot will be rewarded if and only if it completes the pose regulation task. Although deep reinforcement learning (DRL) can achieve such an MDP with sparse rewards, the absence of immediate reward hinders the robot from efficient learning. To this end, we propose a novel imitation learning (IL) method that learns DRL-based policies from demonstrations with inverse reward shaping to overcome the challenge raised by extremely sparse rewards. Moreover, we design a demonstrator to generate various trajectory demonstrations based on one simple example from a nonexpert helper, which greatly reduces the time consumption of collecting robot samples. The simulation results evaluate the effectiveness of our proposed demonstrator and the state-of-the-art (SOTA) performance of our proposed IL method. Furthermore, we deploy the trained IL policy on a physical robotic fish to perform pose regulation in a swimming tank without/with external disturbances. The experimental results verify the effectiveness and robustness of our proposed methods in real world. Therefore, we believe this article is a step forward in the field of biomimetic underwater robot learning.

摘要

在本文中,通过将机器人鱼的姿态调节控制问题表述为马尔可夫决策过程(MDP)来进行研究。这样一个要求机器人以期望的方向到达期望位置的典型任务仍然是一个挑战,因为在优化过程中两个目标(位置和方向)可能相互冲突。为了应对这一挑战,我们采用稀疏奖励方案,即机器人只有在完成姿态调节任务时才会获得奖励。尽管深度强化学习(DRL)可以实现具有稀疏奖励的此类MDP,但缺乏即时奖励阻碍了机器人的高效学习。为此,我们提出了一种新颖的模仿学习(IL)方法,该方法通过逆奖励塑造从示范中学习基于DRL的策略,以克服极其稀疏奖励带来的挑战。此外,我们设计了一个示范器,基于非专业助手的一个简单示例生成各种轨迹示范,这大大减少了收集机器人样本的时间消耗。仿真结果评估了我们提出的示范器的有效性以及我们提出的IL方法的最优性能。此外,我们将训练好的IL策略部署在物理机器人鱼上,以在无/有外部干扰的游泳槽中执行姿态调节。实验结果验证了我们提出的方法在现实世界中的有效性和鲁棒性。因此,我们相信本文在仿生水下机器人学习领域向前迈进了一步。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验