用于自动驾驶车辆非线性预测控制的逆强化学习场景动力学学习

Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles.

作者信息

Grigorescu Sorin M, Zaha Mihai V

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Aug;36(8):13754-13768. doi: 10.1109/TNNLS.2025.3549816.

DOI:10.1109/TNNLS.2025.3549816

Abstract

This article introduces the deep learning-based nonlinear model predictive controller with scene dynamics (DL-NMPC-SD) method for autonomous navigation. DL-NMPC-SD uses an a priori nominal vehicle model in combination with a scene dynamics model learned from temporal range sensing information. The scene dynamics model is responsible for estimating the desired vehicle trajectory, as well as to adjust the true system model used by the underlying model predictive controller. We propose to encode the scene dynamics model within the layers of a deep neural network, which acts as a nonlinear approximator for the high-order state space of the operating conditions. The model is learned based on temporal sequences of range-sensing observations and system states, both integrated by an Augmented Memory component. We use inverse reinforcement learning (IRL) and the Bellman optimality principle to train our learning controller with a modified version of the deep Q-learning (DQL) algorithm, enabling us to estimate the desired state trajectory as an optimal action-value function. We have evaluated DL-NMPC-SD against the baseline dynamic window approach (DWA), as well as against two state-of-the-art End2End and RL methods, respectively. The performance has been measured in three experiments: 1) in our GridSim virtual environment; 2) on indoor and outdoor navigation tasks using our RovisLab autonomous mobile test unit (AMTU) platform; and 3) on a full-scale autonomous test vehicle driving on public roads.

摘要

本文介绍了一种用于自主导航的基于深度学习的带场景动力学的非线性模型预测控制器（DL-NMPC-SD）方法。DL-NMPC-SD使用先验标称车辆模型，并结合从时间范围传感信息中学习到的场景动力学模型。场景动力学模型负责估计期望的车辆轨迹，并调整底层模型预测控制器所使用的真实系统模型。我们建议将场景动力学模型编码在深度神经网络的各层中，该深度神经网络作为运行条件高阶状态空间的非线性逼近器。该模型基于距离传感观测和系统状态的时间序列进行学习，这两者都由增强记忆组件进行整合。我们使用逆强化学习（IRL）和贝尔曼最优性原理，通过深度Q学习（DQL）算法的改进版本来训练我们的学习控制器，使我们能够将期望状态轨迹估计为最优动作值函数。我们已将DL-NMPC-SD分别与基线动态窗口方法（DWA）以及两种最先进的端到端和强化学习方法进行了评估比较。性能在三个实验中进行了测量：1）在我们的GridSim虚拟环境中；2）在使用我们的RovisLab自主移动测试单元（AMTU）平台进行的室内和室外导航任务中；3）在公共道路上行驶的全尺寸自主测试车辆上。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于自动驾驶车辆非线性预测控制的逆强化学习场景动力学学习

Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles.

作者信息

出版信息

相似文献

用于自动驾驶车辆非线性预测控制的逆强化学习场景动力学学习

Inverse RL Scene Dynamics Learning for Nonlinear Predictive Control in Autonomous Vehicles.

作者信息

出版信息

相似文献