Suppr超能文献

用于多目标优化的深度强化学习

Deep Reinforcement Learning for Multiobjective Optimization.

作者信息

Li Kaiwen, Zhang Tao, Wang Rui

出版信息

IEEE Trans Cybern. 2021 Jun;51(6):3103-3114. doi: 10.1109/TCYB.2020.2977661. Epub 2021 May 18.

Abstract

This article proposes an end-to-end framework for solving multiobjective optimization problems (MOPs) using deep reinforcement learning (DRL), that we call DRL-based multiobjective optimization algorithm (DRL-MOA). The idea of decomposition is adopted to decompose the MOP into a set of scalar optimization subproblems. Then, each subproblem is modeled as a neural network. Model parameters of all the subproblems are optimized collaboratively according to a neighborhood-based parameter-transfer strategy and the DRL training algorithm. Pareto-optimal solutions can be directly obtained through the trained neural-network models. Specifically, the multiobjective traveling salesman problem (MOTSP) is solved in this article using the DRL-MOA method by modeling the subproblem as a Pointer Network. Extensive experiments have been conducted to study the DRL-MOA and various benchmark methods are compared with it. It is found that once the trained model is available, it can scale to newly encountered problems with no need for retraining the model. The solutions can be directly obtained by a simple forward calculation of the neural network; thereby, no iteration is required and the MOP can be always solved in a reasonable time. The proposed method provides a new way of solving the MOP by means of DRL. It has shown a set of new characteristics, for example, strong generalization ability and fast solving speed in comparison with the existing methods for multiobjective optimizations. The experimental results show the effectiveness and competitiveness of the proposed method in terms of model performance and running time.

摘要

本文提出了一种使用深度强化学习(DRL)解决多目标优化问题(MOP)的端到端框架,我们称之为基于DRL的多目标优化算法(DRL-MOA)。采用分解思想将多目标优化问题分解为一组标量优化子问题。然后,将每个子问题建模为一个神经网络。所有子问题的模型参数根据基于邻域的参数传递策略和深度强化学习训练算法进行协同优化。通过训练后的神经网络模型可以直接获得帕累托最优解。具体而言,本文使用DRL-MOA方法,通过将子问题建模为指针网络来解决多目标旅行商问题(MOTSP)。进行了大量实验来研究DRL-MOA,并将其与各种基准方法进行比较。结果发现,一旦获得训练好的模型,它可以扩展到新遇到的问题,而无需重新训练模型。通过神经网络的简单前向计算就可以直接获得解决方案;因此,无需迭代,并且多目标优化问题总能在合理的时间内得到解决。所提出的方法为利用深度强化学习解决多目标优化问题提供了一种新途径。与现有的多目标优化方法相比,它展现出了一系列新特性,例如强大的泛化能力和快速的求解速度。实验结果表明了该方法在模型性能和运行时间方面的有效性和竞争力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验