Zhao Rui, Fan Yuze, Li Yun, Zhang Dong, Gao Fei, Gao Zhenhai, Yang Zhengcai
College of Automotive Engineering, Jilin University, Changchun 130025, China.
Graduate School of Information and Science Technology, The University of Tokyo, Tokyo 113-8654, Japan.
Sensors (Basel). 2025 Jan 1;25(1):191. doi: 10.3390/s25010191.
Autonomous driving has demonstrated impressive driving capabilities, with behavior decision-making playing a crucial role as a bridge between perception and control. Imitation Learning (IL) and Reinforcement Learning (RL) have introduced innovative approaches to behavior decision-making in autonomous driving, but challenges remain. On one hand, RL's policy networks often lack sufficient reasoning ability to make optimal decisions in highly complex and stochastic environments. On the other hand, the complexity of these environments leads to low sample efficiency in RL, making it difficult to efficiently learn driving policies. To address these challenges, we propose an innovative Knowledge Distillation-Enhanced Behavior Transformer (KD-BeT) framework. Building on the successful application of Transformers in large language models, we introduce the Behavior Transformer as the policy network in RL, using observation-action history as input for sequential decision-making, thereby leveraging the Transformer's contextual reasoning capabilities. Using a teacher-student paradigm, we first train a small-capacity teacher model quickly and accurately through IL, then apply knowledge distillation to accelerate RL's training efficiency and performance. Simulation results demonstrate that KD-BeT maintains fast convergence and high asymptotic performance during training. In the CARLA NoCrash benchmark tests, KD-BeT outperforms other state-of-the-art methods in terms of traffic efficiency and driving safety, offering a novel solution for addressing real-world autonomous driving tasks.
自动驾驶已经展现出令人印象深刻的驾驶能力,行为决策作为感知与控制之间的桥梁发挥着关键作用。模仿学习(IL)和强化学习(RL)为自动驾驶中的行为决策引入了创新方法,但挑战依然存在。一方面,强化学习的策略网络在高度复杂和随机的环境中往往缺乏足够的推理能力来做出最优决策。另一方面,这些环境的复杂性导致强化学习中的样本效率低下,难以高效地学习驾驶策略。为应对这些挑战,我们提出了一种创新的知识蒸馏增强行为变换器(KD-BeT)框架。基于变换器在大语言模型中的成功应用,我们引入行为变换器作为强化学习中的策略网络,将观察-动作历史作为顺序决策的输入,从而利用变换器的上下文推理能力。使用师生范式,我们首先通过模仿学习快速准确地训练一个小容量的教师模型,然后应用知识蒸馏来提高强化学习的训练效率和性能。仿真结果表明,KD-BeT在训练过程中保持快速收敛和高渐近性能。在CARLA无碰撞基准测试中,KD-BeT在交通效率和驾驶安全性方面优于其他现有先进方法,为解决现实世界中的自动驾驶任务提供了一种新颖的解决方案。