Suppr超能文献

基于模仿专家先验的高效深度强化学习用于自动驾驶

Efficient Deep Reinforcement Learning With Imitative Expert Priors for Autonomous Driving.

作者信息

Huang Zhiyu, Wu Jingda, Lv Chen

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7391-7403. doi: 10.1109/TNNLS.2022.3142822. Epub 2023 Oct 5.

Abstract

Deep reinforcement learning (DRL) is a promising way to achieve human-like autonomous driving. However, the low sample efficiency and difficulty of designing reward functions for DRL would hinder its applications in practice. In light of this, this article proposes a novel framework to incorporate human prior knowledge in DRL, in order to improve the sample efficiency and save the effort of designing sophisticated reward functions. Our framework consists of three ingredients, namely, expert demonstration, policy derivation, and RL. In the expert demonstration step, a human expert demonstrates their execution of the task, and their behaviors are stored as state-action pairs. In the policy derivation step, the imitative expert policy is derived using behavioral cloning and uncertainty estimation relying on the demonstration data. In the RL step, the imitative expert policy is utilized to guide the learning of the DRL agent by regularizing the KL divergence between the DRL agent's policy and the imitative expert policy. To validate the proposed method in autonomous driving applications, two simulated urban driving scenarios (unprotected left turn and roundabout) are designed. The strengths of our proposed method are manifested by the training results as our method can not only achieve the best performance but also significantly improve the sample efficiency in comparison with the baseline algorithms (particularly 60% improvement compared with soft actor-critic). In testing conditions, the agent trained by our method obtains the highest success rate and shows diverse and human-like driving behaviors as demonstrated by the human expert. We also find that using the imitative expert policy trained with the ensemble method that estimates both policy and model uncertainties, as well as increasing the training sample size, can result in better training and testing performance, especially for more difficult tasks. As a result, the proposed method has shown its potential to facilitate the applications of DRL-enabled human-like autonomous driving systems in practice. The code and supplementary videos are also provided. [https://mczhi.github.io/Expert-Prior-RL/].

摘要

深度强化学习(DRL)是实现类人自动驾驶的一种很有前景的方法。然而,DRL的低样本效率以及设计奖励函数的困难会阻碍其在实际中的应用。鉴于此,本文提出了一种新颖的框架,将人类先验知识融入DRL,以提高样本效率并节省设计复杂奖励函数的精力。我们的框架由三个部分组成,即专家示范、策略推导和强化学习。在专家示范步骤中,人类专家展示其任务执行过程,其行为被存储为状态 - 动作对。在策略推导步骤中,基于示范数据,通过行为克隆和不确定性估计来推导模仿专家策略。在强化学习步骤中,通过对DRL智能体策略与模仿专家策略之间的KL散度进行正则化,利用模仿专家策略来指导DRL智能体的学习。为了在自动驾驶应用中验证所提出的方法,设计了两个模拟城市驾驶场景(无保护左转弯和环形交叉路口)。我们所提出方法的优势在训练结果中得以体现,因为我们的方法不仅能取得最佳性能,而且与基线算法相比能显著提高样本效率(特别是与软演员 - 评论家算法相比提高了60%)。在测试条件下,由我们的方法训练的智能体获得了最高成功率,并表现出如人类专家所示的多样且类人的驾驶行为。我们还发现,使用通过估计策略和模型不确定性的集成方法训练的模仿专家策略,以及增加训练样本量,可以带来更好的训练和测试性能,特别是对于更困难的任务。因此,所提出的方法已显示出在实际中促进启用DRL的类人自动驾驶系统应用的潜力。同时还提供了代码和补充视频。[https://mczhi.github.io/Expert-Prior-RL/]

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验