School of Automotive Studies, Tongji University, Shanghai 201804, China.
Sensors (Basel). 2020 Dec 18;20(24):7297. doi: 10.3390/s20247297.
Reinforcement learning (RL) is a promising direction in automated parking systems (APSs), as integrating planning and tracking control using RL can potentially maximize the overall performance. However, commonly used model-free RL requires many interactions to achieve acceptable performance, and model-based RL in APS cannot continuously learn. In this paper, a data-efficient RL method is constructed to learn from data by use of a model-based method. The proposed method uses a truncated Monte Carlo tree search to evaluate parking states and select moves. Two artificial neural networks are trained to provide the search probability of each tree branch and the final reward for each state using self-trained data. The data efficiency is enhanced by weighting exploration with parking trajectory returns, an adaptive exploration scheme, and experience augmentation with imaginary rollouts. Without human demonstrations, a novel training pipeline is also used to train the initial action guidance network and the state value network. Compared with path planning and path-following methods, the proposed integrated method can flexibly co-ordinate the longitudinal and lateral motion to park a smaller parking space in one maneuver. Its adaptability to changes in the vehicle model is verified by joint Carsim and MATLAB simulation, demonstrating that the algorithm converges within a few iterations. Finally, experiments using a real vehicle platform are used to further verify the effectiveness of the proposed method. Compared with obtaining rewards using simulation, the proposed method achieves a better final parking attitude and success rate.
强化学习 (RL) 是自动泊车系统 (APS) 的一个有前途的方向,因为使用 RL 集成规划和跟踪控制可以最大限度地提高整体性能。然而,常用的无模型 RL 需要许多交互才能达到可接受的性能,而 APS 中的基于模型的 RL 不能持续学习。在本文中,构建了一种数据高效的 RL 方法,通过基于模型的方法从数据中学习。所提出的方法使用截断的蒙特卡罗树搜索来评估泊车状态并选择移动。训练了两个人工神经网络,以使用自训练数据提供每条树分支的搜索概率和每个状态的最终奖励。通过使用泊车轨迹回报、自适应探索方案和想象滚动的经验增强来加权探索,提高了数据效率。在没有人为示范的情况下,还使用了一种新的训练管道来训练初始动作引导网络和状态值网络。与路径规划和路径跟踪方法相比,所提出的集成方法可以灵活地协调纵向和横向运动,以便在一次操作中停放更小的停车位。通过联合 Carsim 和 MATLAB 仿真验证了其对车辆模型变化的适应性,表明算法在几次迭代内收敛。最后,使用真实车辆平台进行的实验进一步验证了所提出方法的有效性。与使用仿真获得奖励相比,所提出的方法实现了更好的最终泊车姿态和成功率。