• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自主赛车的极限探索:最优控制与强化学习

Reaching the limit in autonomous racing: Optimal control versus reinforcement learning.

作者信息

Song Yunlong, Romero Angel, Müller Matthias, Koltun Vladlen, Scaramuzza Davide

机构信息

University of Zurich, Zurich, Switzerland.

Intel Labs, Jackson, WY, USA.

出版信息

Sci Robot. 2023 Sep 27;8(82):eadg1462. doi: 10.1126/scirobotics.adg1462. Epub 2023 Sep 13.

DOI:10.1126/scirobotics.adg1462
PMID:37703383
Abstract

A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.

摘要

机器人技术中的一个核心问题是如何为敏捷移动机器人设计控制系统。本文系统地研究了这个问题,重点关注一个具有挑战性的场景:自主无人机竞赛。我们表明,在这种场景下,通过强化学习(RL)训练的神经网络控制器优于最优控制(OC)方法。然后,我们研究了哪些基本因素促成了RL的成功或限制了OC。我们的研究表明,RL相对于OC的根本优势不在于它能更好地优化其目标,而在于它能优化一个更好的目标。OC通过一个明确的中间表示(如轨迹)将问题分解为规划和控制,该中间表示充当接口。这种分解限制了控制器能够表达的行为范围,导致在面对未建模的影响时控制性能较差。相比之下,RL可以直接优化任务级目标,并可以利用领域随机化来应对模型不确定性,从而发现更强大的控制响应。我们的发现使我们能够将一架敏捷无人机推向其最大性能,实现了大于重力加速度12倍的峰值加速度和每小时108公里的峰值速度。我们的策略在标准工作站上训练几分钟内就实现了超人控制。这项工作在敏捷机器人技术方面取得了一个里程碑,并阐明了RL和OC在机器人控制中的作用。

相似文献

1
Reaching the limit in autonomous racing: Optimal control versus reinforcement learning.自主赛车的极限探索:最优控制与强化学习
Sci Robot. 2023 Sep 27;8(82):eadg1462. doi: 10.1126/scirobotics.adg1462. Epub 2023 Sep 13.
2
Visual attention prediction improves performance of autonomous drone racing agents.视觉注意预测能提高自动驾驶无人机竞赛代理的表现。
PLoS One. 2022 Mar 1;17(3):e0264471. doi: 10.1371/journal.pone.0264471. eCollection 2022.
3
Champion-level drone racing using deep reinforcement learning.使用深度强化学习的冠军级无人机竞速。
Nature. 2023 Aug;620(7976):982-987. doi: 10.1038/s41586-023-06419-4. Epub 2023 Aug 30.
4
RL-DOVS: Reinforcement Learning for Autonomous Robot Navigation in Dynamic Environments.RL-DOVS:动态环境下自主机器人导航的强化学习。
Sensors (Basel). 2022 May 19;22(10):3847. doi: 10.3390/s22103847.
5
Energy-efficient and damage-recovery slithering gait design for a snake-like robot based on reinforcement learning and inverse reinforcement learning.基于强化学习和逆强化学习的蛇形机器人节能与损伤恢复蠕动步态设计。
Neural Netw. 2020 Sep;129:323-333. doi: 10.1016/j.neunet.2020.05.029. Epub 2020 Jun 16.
6
AlphaPilot: autonomous drone racing.阿尔法飞行员:自主无人机竞速。
Auton Robots. 2022;46(1):307-320. doi: 10.1007/s10514-021-10011-y. Epub 2021 Oct 19.
7
Force-guided autonomous robotic ultrasound scanning control method for soft uncertain environment.力引导的自主机器人超声扫描控制方法用于软不确定环境。
Int J Comput Assist Radiol Surg. 2021 Dec;16(12):2189-2199. doi: 10.1007/s11548-021-02462-6. Epub 2021 Aug 9.
8
Robust walking control of a lower limb rehabilitation exoskeleton coupled with a musculoskeletal model via deep reinforcement learning.通过深度强化学习,实现下肢康复外骨骼与肌肉骨骼模型的稳健行走控制。
J Neuroeng Rehabil. 2023 Mar 19;20(1):34. doi: 10.1186/s12984-023-01147-2.
9
Human-Guided Reinforcement Learning With Sim-to-Real Transfer for Autonomous Navigation.用于自主导航的基于人引导强化学习的模拟到现实迁移
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14745-14759. doi: 10.1109/TPAMI.2023.3314762. Epub 2023 Nov 3.
10
MOSAIC for multiple-reward environments.多奖励环境下的 MOSAIC 算法。
Neural Comput. 2012 Mar;24(3):577-606. doi: 10.1162/NECO_a_00246. Epub 2011 Dec 14.

引用本文的文献

1
Emergence of natural and robust bipedal walking by learning from biologically plausible objectives.通过从生物学上合理的目标中学习实现自然且稳健的双足行走的出现。
iScience. 2025 Mar 11;28(4):112203. doi: 10.1016/j.isci.2025.112203. eCollection 2025 Apr 18.
2
Agile perching maneuvers in birds and morphing-wing drones.鸟类的敏捷栖息动作和变体机翼无人机。
Nat Commun. 2024 Sep 27;15(1):8330. doi: 10.1038/s41467-024-52369-4.
3
On human-in-the-loop optimization of human-robot interaction.在人机交互的人机闭环优化。
Nature. 2024 Sep;633(8031):779-788. doi: 10.1038/s41586-024-07697-2. Epub 2024 Sep 25.
4
Constrained trajectory optimization and force control for UAVs with universal jamming grippers.具有通用干扰夹具的无人机的约束轨迹优化与力控制
Sci Rep. 2024 May 25;14(1):11968. doi: 10.1038/s41598-024-62416-1.