Song Yunlong, Romero Angel, Müller Matthias, Koltun Vladlen, Scaramuzza Davide
University of Zurich, Zurich, Switzerland.
Intel Labs, Jackson, WY, USA.
Sci Robot. 2023 Sep 27;8(82):eadg1462. doi: 10.1126/scirobotics.adg1462. Epub 2023 Sep 13.
A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.
机器人技术中的一个核心问题是如何为敏捷移动机器人设计控制系统。本文系统地研究了这个问题,重点关注一个具有挑战性的场景:自主无人机竞赛。我们表明,在这种场景下,通过强化学习(RL)训练的神经网络控制器优于最优控制(OC)方法。然后,我们研究了哪些基本因素促成了RL的成功或限制了OC。我们的研究表明,RL相对于OC的根本优势不在于它能更好地优化其目标,而在于它能优化一个更好的目标。OC通过一个明确的中间表示(如轨迹)将问题分解为规划和控制,该中间表示充当接口。这种分解限制了控制器能够表达的行为范围,导致在面对未建模的影响时控制性能较差。相比之下,RL可以直接优化任务级目标,并可以利用领域随机化来应对模型不确定性,从而发现更强大的控制响应。我们的发现使我们能够将一架敏捷无人机推向其最大性能,实现了大于重力加速度12倍的峰值加速度和每小时108公里的峰值速度。我们的策略在标准工作站上训练几分钟内就实现了超人控制。这项工作在敏捷机器人技术方面取得了一个里程碑,并阐明了RL和OC在机器人控制中的作用。