Wang Zhenkun, Yao Shunyu, Li Genghui, Zhang Qingfu
IEEE Trans Cybern. 2024 Mar;54(3):1984-1996. doi: 10.1109/TCYB.2023.3312476. Epub 2024 Feb 9.
This article proposes utilizing a single deep reinforcement learning model to solve combinatorial multiobjective optimization problems. We use the well-known multiobjective traveling salesman problem (MOTSP) as an example. Our proposed method employs an encoder-decoder framework to learn the mapping from the MOTSP instance to its Pareto-optimal set. Specifically, it leverages a novel routing encoder to extract information for both the entire multiobjective aspect and every individual objective from the MOTSP instance. The global embeddings and each objective's embeddings are adaptively aggregated via a routing network to form the subproblems' embedding that can well represent the MOTSP features. Using a modified context embedding, the subproblems' embeddings are fed into a decoder to produce a set of approximate Pareto-optimal solutions in parallel. Additionally, we develop a Top-k baseline to enable more efficient data utilization and lightweight training for our proposed method. We compare our method with heuristic-based and learning-based ones on various types of MOTSP instances, and the experimental results show that our method can solve MOTSP instances in real-time and outperform the other algorithms, especially on large-scale problem instances.
本文提出利用单个深度强化学习模型来解决组合多目标优化问题。我们以著名的多目标旅行商问题(MOTSP)为例。我们提出的方法采用编码器 - 解码器框架来学习从MOTSP实例到其帕累托最优集的映射。具体而言,它利用一种新颖的路由编码器从MOTSP实例中提取整个多目标方面以及每个单独目标的信息。全局嵌入和每个目标的嵌入通过路由网络进行自适应聚合,以形成能够很好地表示MOTSP特征的子问题嵌入。使用修改后的上下文嵌入,将子问题的嵌入输入到解码器中,以并行生成一组近似帕累托最优解。此外,我们开发了一种Top - k基线,以实现更高效的数据利用和我们提出方法的轻量级训练。我们在各种类型的MOTSP实例上,将我们的方法与基于启发式和基于学习的方法进行比较,实验结果表明,我们的方法可以实时解决MOTSP实例,并且优于其他算法,特别是在大规模问题实例上。