Suppr超能文献

基于深度强化学习的动态置换流水车间调度智能决策

Intelligent Decision-Making of Scheduling for Dynamic Permutation Flowshop via Deep Reinforcement Learning.

机构信息

Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China.

Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China.

出版信息

Sensors (Basel). 2021 Feb 2;21(3):1019. doi: 10.3390/s21031019.

Abstract

Dynamic scheduling problems have been receiving increasing attention in recent years due to their practical implications. To realize real-time and the intelligent decision-making of dynamic scheduling, we studied dynamic permutation flowshop scheduling problem (PFSP) with new job arrival using deep reinforcement learning (DRL). A system architecture for solving dynamic PFSP using DRL is proposed, and the mathematical model to minimize total tardiness cost is established. Additionally, the intelligent scheduling system based on DRL is modeled, with state features, actions, and reward designed. Moreover, the advantage actor-critic (A2C) algorithm is adapted to train the scheduling agent. The learning curve indicates that the scheduling agent learned to generate better solutions efficiently during training. Extensive experiments are carried out to compare the A2C-based scheduling agent with every single action, other DRL algorithms, and meta-heuristics. The results show the well performance of the A2C-based scheduling agent considering solution quality, CPU times, and generalization. Notably, the trained agent generates a scheduling action only in 2.16 ms on average, which is almost instantaneous and can be used for real-time scheduling. Our work can help to build a self-learning, real-time optimizing, and intelligent decision-making scheduling system.

摘要

近年来,由于动态调度问题具有实际意义,因此受到了越来越多的关注。为了实现动态调度的实时和智能决策,我们使用深度学习(DRL)研究了具有新作业到达的动态置换流水车间调度问题(PFSP)。提出了一种使用 DRL 解决动态 PFSP 的系统架构,并建立了最小化总延迟成本的数学模型。此外,还对基于 DRL 的智能调度系统进行建模,设计了状态特征、动作和奖励。此外,还采用优势动作-批评(A2C)算法来训练调度代理。学习曲线表明,调度代理在训练过程中学会了高效地生成更好的解决方案。进行了广泛的实验来比较基于 A2C 的调度代理与每个单独的动作、其他 DRL 算法和元启发式算法。结果表明,基于 A2C 的调度代理在考虑解决方案质量、CPU 时间和泛化方面表现良好。值得注意的是,训练有素的代理平均只需 2.16 毫秒即可生成调度动作,几乎是即时的,可以用于实时调度。我们的工作可以帮助构建一个自学习、实时优化和智能决策的调度系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ef/7867337/eb46cfaf2821/sensors-21-01019-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验