Li Jingwen, Ma Yining, Gao Ruize, Cao Zhiguang, Lim Andrew, Song Wen, Zhang Jie
IEEE Trans Cybern. 2022 Dec;52(12):13572-13585. doi: 10.1109/TCYB.2021.3111082. Epub 2022 Nov 18.
Existing deep reinforcement learning (DRL)-based methods for solving the capacitated vehicle routing problem (CVRP) intrinsically cope with a homogeneous vehicle fleet, in which the fleet is assumed as repetitions of a single vehicle. Hence, their key to construct a solution solely lies in the selection of the next node (customer) to visit excluding the selection of vehicle. However, vehicles in real-world scenarios are likely to be heterogeneous with different characteristics that affect their capacity (or travel speed), rendering existing DRL methods less effective. In this article, we tackle heterogeneous CVRP (HCVRP), where vehicles are mainly characterized by different capacities. We consider both min-max and min-sum objectives for HCVRP, which aim to minimize the longest or total travel time of the vehicle(s) in the fleet. To solve those problems, we propose a DRL method based on the attention mechanism with a vehicle selection decoder accounting for the heterogeneous fleet constraint and a node selection decoder accounting for the route construction, which learns to construct a solution by automatically selecting both a vehicle and a node for this vehicle at each step. Experimental results based on randomly generated instances show that, with desirable generalization to various problem sizes, our method outperforms the state-of-the-art DRL method and most of the conventional heuristics, and also delivers competitive performance against the state-of-the-art heuristic method, that is, slack induction by string removal. In addition, the results of extended experiments demonstrate that our method is also able to solve CVRPLib instances with satisfactory performance.
现有的基于深度强化学习(DRL)解决容量受限车辆路径问题(CVRP)的方法本质上处理的是同构车辆车队,即车队被假定为单一车辆的重复。因此,它们构建解决方案的关键仅在于选择下一个要访问的节点(客户),而不包括车辆的选择。然而,现实场景中的车辆可能具有不同特性从而呈现异构性,这些特性会影响其容量(或行驶速度),使得现有的深度强化学习方法效果不佳。在本文中,我们处理异构CVRP(HCVRP),其中车辆主要以不同容量为特征。我们考虑了HCVRP的最小最大和最小和目标,其旨在最小化车队中车辆的最长或总行驶时间。为了解决这些问题,我们提出了一种基于注意力机制的深度强化学习方法,该方法具有一个考虑异构车队约束的车辆选择解码器和一个考虑路径构建的节点选择解码器,它通过在每一步自动为车辆选择车辆和节点来学习构建解决方案。基于随机生成实例的实验结果表明,我们的方法对各种问题规模具有良好的泛化能力,优于现有最先进的深度强化学习方法和大多数传统启发式算法,并且与现有最先进的启发式方法(即通过字符串删除进行松弛归纳)相比也具有竞争力。此外,扩展实验结果表明,我们的方法也能够以令人满意的性能解决CVRPLib实例。