Jenelten Fabian, He Junzhe, Farshidian Farbod, Hutter Marco
Robotic Systems Lab, ETH Zurich, 8092 Zurich, Switzerland.
Sci Robot. 2024 Jan 17;9(86):eadh5401. doi: 10.1126/scirobotics.adh5401.
Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing because of intuitive cost function tuning, accurate planning, generalization, and, most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach uses a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluated the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared with model-based counterparts. Last, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.
腿部运动是一个复杂的控制问题,需要准确性和鲁棒性来应对现实世界的挑战。传统上,腿部系统是使用逆动力学的轨迹优化来控制的。这种基于分层模型的方法很有吸引力,因为其成本函数调整直观、规划准确、具有通用性,而且最重要的是,经过十多年的广泛研究获得了深刻的理解。然而,模型不匹配和假设违反是导致故障运行的常见原因。另一方面,基于模拟的强化学习产生了具有前所未有的鲁棒性和恢复技能的运动策略。然而,所有学习算法都难以应对有效立足点稀少的环境(如间隙或踏脚石)中出现的稀疏奖励。在这项工作中,我们提出了一种混合控制架构,它结合了两种方法的优点,以同时实现更高的鲁棒性、脚部放置精度和地形通用性。我们的方法在训练期间使用基于模型的规划器来推出参考运动。在模拟中训练一个深度神经网络策略,旨在跟踪优化后的立足点。我们在稀疏地形上评估了我们的运动管道的准确性,在这种地形上,纯数据驱动的方法容易失败。此外,与基于模型的对应方法相比,我们在存在滑溜或可变形地面的情况下表现出卓越的鲁棒性。最后,我们表明我们提出的跟踪控制器能够推广到训练期间未见过的不同轨迹优化方法。总之,我们的工作将在线规划的预测能力和最优性保证与离线学习所具有的固有鲁棒性结合在一起。