Graduate Aerospace Laboratories, California Institute of Technology, 1200 E California Blvd, Pasadena, CA, 91125, USA.
Computational Science and Engineering Laboratory, ETH Zurich, 8093, Zurich, Switzerland.
Nat Commun. 2021 Dec 8;12(1):7143. doi: 10.1038/s41467-021-27015-y.
Efficient point-to-point navigation in the presence of a background flow field is important for robotic applications such as ocean surveying. In such applications, robots may only have knowledge of their immediate surroundings or be faced with time-varying currents, which limits the use of optimal control techniques. Here, we apply a recently introduced Reinforcement Learning algorithm to discover time-efficient navigation policies to steer a fixed-speed swimmer through unsteady two-dimensional flow fields. The algorithm entails inputting environmental cues into a deep neural network that determines the swimmer's actions, and deploying Remember and Forget Experience Replay. We find that the resulting swimmers successfully exploit the background flow to reach the target, but that this success depends on the sensed environmental cue. Surprisingly, a velocity sensing approach significantly outperformed a bio-mimetic vorticity sensing approach, and achieved a near 100% success rate in reaching the target locations while approaching the time-efficiency of optimal navigation trajectories.
在存在背景流场的情况下实现高效的点对点导航对于机器人应用(如海洋勘测)非常重要。在这些应用中,机器人可能只了解其周围环境,或者面临时变水流,这限制了最优控制技术的使用。在这里,我们应用最近引入的强化学习算法来发现高效的导航策略,以引导固定速度的游泳者通过不稳定的二维流场。该算法需要将环境提示输入到一个深度神经网络中,该网络确定游泳者的动作,并部署“记住”和“忘记”经验回放。我们发现,由此产生的游泳者成功地利用背景流到达目标,但这一成功取决于所感知的环境提示。令人惊讶的是,速度感应方法的表现明显优于仿生涡度感应方法,并且在接近最优导航轨迹的时间效率的同时,接近 100%的成功率达到目标位置。