基于深度强化学习的智能车模型转换轨迹规划方法。

Intelligent Land-Vehicle Model Transfer Trajectory Planning Method Based on Deep Reinforcement Learning.

机构信息

School of Information Science and Engineering, Central South University, Changsha 410083, China.

State Key Laboratory of Robotics and System, Harbin Institute of Technology, Haerbin 150001, China.

出版信息

Sensors (Basel). 2018 Sep 1;18(9):2905. doi: 10.3390/s18092905.

DOI:10.3390/s18092905

PMID:30200499

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6164024/

Abstract

To address the problem of model error and tracking dependence in the process of intelligent vehicle motion planning, an intelligent vehicle model transfer trajectory planning method based on deep reinforcement learning is proposed, which is able to obtain an effective control action sequence directly. Firstly, an abstract model of the real environment is extracted. On this basis, a deep deterministic policy gradient (DDPG) and a vehicle dynamic model are adopted to jointly train a reinforcement learning model, and to decide the optimal intelligent driving maneuver. Secondly, the actual scene is transferred to an equivalent virtual abstract scene using a transfer model. Furthermore, the control action and trajectory sequences are calculated according to the trained deep reinforcement learning model. Thirdly, the optimal trajectory sequence is selected according to an evaluation function in the real environment. Finally, the results demonstrate that the proposed method can deal with the problem of intelligent vehicle trajectory planning for continuous input and continuous output. The model transfer method improves the model's generalization performance. Compared with traditional trajectory planning, the proposed method outputs continuous rotation-angle control sequences. Moreover, the lateral control errors are also reduced.

摘要

为了解决智能车辆运动规划过程中模型误差和跟踪依赖的问题，提出了一种基于深度强化学习的智能车辆模型迁移轨迹规划方法，该方法能够直接获得有效的控制动作序列。首先，提取真实环境的抽象模型。在此基础上，采用深度确定性策略梯度（DDPG）和车辆动力学模型共同训练强化学习模型，从而确定最优智能驾驶操纵。其次，使用迁移模型将实际场景转换为等效的虚拟抽象场景。然后，根据训练好的深度强化学习模型计算控制动作和轨迹序列。再次，根据真实环境中的评估函数选择最优轨迹序列。最后，结果表明，所提出的方法可以处理连续输入和连续输出的智能车辆轨迹规划问题。模型迁移方法提高了模型的泛化性能。与传统轨迹规划相比，所提出的方法输出连续的转角控制序列，并且减少了横向控制误差。