Zhang Jin, Ma Nan, Wu Zhixuan, Wang Cheng, Yao Yongqiang
Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China.
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
Math Biosci Eng. 2024 May 24;21(5):6077-6096. doi: 10.3934/mbe.2024267.
Due to the complexity of the driving environment and the dynamics of the behavior of traffic participants, self-driving in dense traffic flow is very challenging. Traditional methods usually rely on predefined rules, which are difficult to adapt to various driving scenarios. Deep reinforcement learning (DRL) shows advantages over rule-based methods in complex self-driving environments, demonstrating the great potential of intelligent decision-making. However, one of the problems of DRL is the inefficiency of exploration; typically, it requires a lot of trial and error to learn the optimal policy, which leads to its slow learning rate and makes it difficult for the agent to learn well-performing decision-making policies in self-driving scenarios. Inspired by the outstanding performance of supervised learning in classification tasks, we propose a self-driving intelligent control method that combines human driving experience and adaptive sampling supervised actor-critic algorithm. Unlike traditional DRL, we modified the learning process of the policy network by combining supervised learning and DRL and adding human driving experience to the learning samples to better guide the self-driving vehicle to learn the optimal policy through human driving experience and real-time human guidance. In addition, in order to make the agent learn more efficiently, we introduced real-time human guidance in its learning process, and an adaptive balanced sampling method was designed for improving the sampling performance. We also designed the reward function in detail for different evaluation indexes such as traffic efficiency, which further guides the agent to learn the self-driving intelligent control policy in a better way. The experimental results show that the method is able to control vehicles in complex traffic environments for self-driving tasks and exhibits better performance than other DRL methods.
由于驾驶环境的复杂性和交通参与者行为的动态性,在密集交通流中进行自动驾驶极具挑战性。传统方法通常依赖预定义规则,难以适应各种驾驶场景。深度强化学习(DRL)在复杂的自动驾驶环境中比基于规则的方法更具优势,展现出智能决策的巨大潜力。然而,DRL的问题之一是探索效率低下;通常,它需要大量的试错来学习最优策略,这导致其学习速度缓慢,使得智能体在自动驾驶场景中难以学习到性能良好的决策策略。受监督学习在分类任务中出色表现的启发,我们提出一种结合人类驾驶经验和自适应采样监督式演员评论家算法的自动驾驶智能控制方法。与传统DRL不同,我们通过结合监督学习和DRL,并在学习样本中加入人类驾驶经验来修改策略网络的学习过程,以更好地引导自动驾驶车辆通过人类驾驶经验和实时人类指导学习最优策略。此外,为了使智能体学习更高效,我们在其学习过程中引入实时人类指导,并设计了一种自适应平衡采样方法来提高采样性能。我们还针对交通效率等不同评估指标详细设计了奖励函数,进一步引导智能体更好地学习自动驾驶智能控制策略。实验结果表明,该方法能够在复杂交通环境中控制车辆执行自动驾驶任务,并且表现出比其他DRL方法更好的性能。