The College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China.
Sensors (Basel). 2020 Oct 1;20(19):5626. doi: 10.3390/s20195626.
Autonomous driving with artificial intelligence technology has been viewed as promising for autonomous vehicles hitting the road in the near future. In recent years, considerable progress has been made with Deep Reinforcement Learnings (DRLs) for realizing end-to-end autonomous driving. Still, driving safely and comfortably in real dynamic scenarios with DRL is nontrivial due to the reward functions being typically pre-defined with expertise. This paper proposes a human-in-the-loop DRL algorithm for learning personalized autonomous driving behavior in a progressive learning way. Specifically, a progressively optimized reward function (PORF) learning model is built and integrated into the Deep Deterministic Policy Gradient (DDPG) framework, which is called PORF-DDPG in this paper. PORF consists of two parts: the first part of the PORF is a pre-defined typical reward function on the system state, the second part is modeled as a Deep Neural Network (DNN) for representing driving adjusting intention by the human observer, which is the main contribution of this paper. The DNN-based reward model is progressively learned using the front-view images as the input and via active human supervision and intervention. The proposed approach is potentially useful for driving in dynamic constrained scenarios when dangerous collision events might occur frequently with classic DRLs. The experimental results show that the proposed autonomous driving behavior learning method exhibits online learning capability and environmental adaptability.
标题:基于深度学习的自动驾驶中基于人在环的个性化驾驶行为学习方法
摘要: 利用人工智能技术实现自动驾驶,被认为是自动驾驶汽车在不久的将来上路的一种有前途的方式。近年来,深度强化学习(DRL)在实现端到端自动驾驶方面取得了相当大的进展。然而,由于奖励函数通常是由专业知识预先定义的,因此在真实动态场景中安全舒适地驾驶仍然具有挑战性。本文提出了一种基于人在环的深度强化学习算法,用于以渐进式学习的方式学习个性化自动驾驶行为。具体来说,构建了一个逐步优化的奖励函数(PORF)学习模型,并将其集成到深度确定性策略梯度(DDPG)框架中,在本文中称为 PORF-DDPG。PORF 由两部分组成:PORF 的第一部分是系统状态的预定义典型奖励函数,第二部分建模为深度神经网络(DNN),用于表示人类观察者的驾驶调整意图,这是本文的主要贡献。基于 DNN 的奖励模型使用前视图图像作为输入,通过主动的人工监督和干预进行逐步学习。当危险碰撞事件可能频繁发生时,所提出的方法对于在动态约束场景中驾驶具有潜在的用途。实验结果表明,所提出的自动驾驶行为学习方法具有在线学习能力和环境适应能力。