Flato Yoav, Harel Roi, Tamar Aviv, Nathan Ran, Beatus Tsevi
Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel.
Department of Ecology, Evolution, and Behavior, Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel.
Nat Commun. 2024 Jun 10;15(1):4942. doi: 10.1038/s41467-024-48670-x.
Thermal soaring, a technique used by birds and gliders to utilize updrafts of hot air, is an appealing model-problem for studying motion control and how it is learned by animals and engineered autonomous systems. Thermal soaring has rich dynamics and nontrivial constraints, yet it uses few control parameters and is becoming experimentally accessible. Following recent developments in applying reinforcement learning methods for training deep neural-network (deep-RL) models to soar autonomously both in simulation and real gliders, here we develop a simulation-based deep-RL system to study the learning process of thermal soaring. We find that this process has learning bottlenecks, we define a new efficiency metric and use it to characterize learning robustness, we compare the learned policy to data from soaring vultures, and find that the neurons of the trained network divide into function clusters that evolve during learning. These results pose thermal soaring as a rich yet tractable model-problem for the learning of motion control.
热气流翱翔是鸟类和滑翔机利用热空气上升气流的一种技术,是研究运动控制以及动物和工程自主系统如何学习运动控制的一个有吸引力的模型问题。热气流翱翔具有丰富的动力学特性和非平凡的约束条件,但它使用的控制参数很少,并且在实验上越来越容易实现。随着最近在应用强化学习方法训练深度神经网络(深度强化学习)模型以在模拟和实际滑翔机中自主翱翔方面的进展,我们在此开发了一个基于模拟的深度强化学习系统来研究热气流翱翔的学习过程。我们发现这个过程存在学习瓶颈,定义了一种新的效率指标并使用它来表征学习的稳健性,将学习到的策略与翱翔秃鹰的数据进行比较,发现训练网络的神经元会分成在学习过程中不断演变的功能簇。这些结果表明,热气流翱翔是学习运动控制的一个丰富但易于处理的模型问题。