一种基于新型方法的强化学习，结合深度时间差分网络用于柔性双车间调度问题。

A novel method-based reinforcement learning with deep temporal difference network for flexible double shop scheduling problem.

作者信息

Wang Xiao, Zhong Peisi, Liu Mei, Zhang Chao, Yang Shihao

机构信息

College of Mechanical and Electronic Engineering, Shandong University of Science and Technology, Qingdao, 266590, China.

Advanced Manufacturing Technology Centre, Shandong University of Science and Technology, Qingdao, 266590, China.

出版信息

Sci Rep. 2024 Apr 20;14(1):9047. doi: 10.1038/s41598-024-59414-8.

DOI:10.1038/s41598-024-59414-8

PMID:38641689

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11031591/

Abstract

This paper studies the flexible double shop scheduling problem (FDSSP) that considers simultaneously job shop and assembly shop. It brings about the problem of scheduling association of the related tasks. To this end, a reinforcement learning algorithm with a deep temporal difference network is proposed to minimize the makespan. Firstly, the FDSSP is defined as the mathematical model of the flexible job-shop scheduling problem joined to the assembly constraint level. It is translated into a Markov decision process that directly selects behavioral strategies according to historical machining state data. Secondly, the proposed ten generic state features are input into the deep neural network model to fit the state value function. Similarly, eight simple constructive heuristics are used as candidate actions for scheduling decisions. From the greedy mechanism, optimally combined actions of all machines are obtained for each decision step. Finally, a deep temporal difference reinforcement learning framework is established, and a large number of comparative experiments are designed to analyze the basic performance of this algorithm. The results showed that the proposed algorithm was better than most other methods, which contributed to solving the practical production problem of the manufacturing industry.

摘要

本文研究了同时考虑作业车间和装配车间的柔性双车间调度问题（FDSSP）。它带来了相关任务调度关联的问题。为此，提出了一种具有深度时序差分网络的强化学习算法，以最小化完工时间。首先，将FDSSP定义为与装配约束水平相结合的柔性作业车间调度问题的数学模型。它被转化为一个马尔可夫决策过程，该过程根据历史加工状态数据直接选择行为策略。其次，将所提出的十个通用状态特征输入到深度神经网络模型中，以拟合状态值函数。同样，八个简单的构造启发式算法被用作调度决策的候选动作。从贪心机制出发，为每个决策步骤获得所有机器的最优组合动作。最后，建立了一个深度时序差分强化学习框架，并设计了大量对比实验来分析该算法的基本性能。结果表明，所提出的算法优于大多数其他方法，这有助于解决制造业的实际生产问题。