• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于安全深度强化学习方法的残余物理和后置屏蔽

Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.

作者信息

Zhang Qingang, Mahbod Muhammad Haiqal Bin, Chng Chin-Boon, Lee Poh-Seng, Chui Chee-Kong

出版信息

IEEE Trans Cybern. 2024 Feb;54(2):865-876. doi: 10.1109/TCYB.2022.3178084. Epub 2024 Jan 17.

DOI:10.1109/TCYB.2022.3178084
PMID:35700256
Abstract

Deep reinforcement learning (DRL) has been researched for computer room air conditioning unit control problems in data centers (DCs). However, two main issues limit the deployment of DRL in actual systems. First, a large amount of data is needed. Next, as a mission-critical system, safe control needs to be guaranteed, and temperatures in DCs should be kept within a certain operating range. To mitigate these issues, this article proposes a novel control method RP-SDRL. First, Residual Physics, built using the first law of thermodynamics, is integrated with the DRL algorithm and a Prediction Model. Subsequently, a Correction Model adapted from gradient descent is combined with the Prediction Model as Post-Posed Shielding to enforce safe actions. The RP-SDRL method was validated using simulation. Noise is added to the states of the model to further test its performance under state uncertainty. Experimental results show that the combination of Residual Physics and DRL can significantly improve the initial policy, sample efficiency, and robustness. Residual Physics can also improve the sample efficiency and the accuracy of the prediction model. While DRL alone cannot avoid constraint violations, RP-SDRL can detect unsafe actions and significantly reduce violations. Compared to the baseline controller, about 13% of electricity usage can be saved.

摘要

深度强化学习(DRL)已被用于研究数据中心(DC)机房空调机组的控制问题。然而,有两个主要问题限制了DRL在实际系统中的应用。首先,需要大量数据。其次,作为关键任务系统,需要保证安全控制,并且数据中心的温度应保持在一定的运行范围内。为了缓解这些问题,本文提出了一种新颖的控制方法RP-SDRL。首先,利用热力学第一定律构建的残差物理模型与DRL算法和预测模型相结合。随后,将基于梯度下降改编的校正模型与预测模型相结合,作为后置屏蔽来强制执行安全动作。通过仿真对RP-SDRL方法进行了验证。向模型状态添加噪声以进一步测试其在状态不确定性下的性能。实验结果表明,残差物理模型与DRL的结合可以显著提高初始策略、样本效率和鲁棒性。残差物理模型还可以提高样本效率和预测模型的准确性。虽然单独的DRL无法避免违反约束,但RP-SDRL可以检测到不安全动作并显著减少违规情况。与基线控制器相比,可节省约13%的电力消耗。

相似文献

1
Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.用于安全深度强化学习方法的残余物理和后置屏蔽
IEEE Trans Cybern. 2024 Feb;54(2):865-876. doi: 10.1109/TCYB.2022.3178084. Epub 2024 Jan 17.
2
Safe deep reinforcement learning in diesel engine emission control.柴油发动机排放控制中的安全深度强化学习
Proc Inst Mech Eng Part I J Syst Control Eng. 2023 Sep;237(8):1440-1453. doi: 10.1177/09596518231153445. Epub 2023 Feb 17.
3
Physics-informed reinforcement learning for motion control of a fish-like swimming robot.基于物理信息的强化学习在仿鱼游动机器人运动控制中的应用。
Sci Rep. 2023 Jul 3;13(1):10754. doi: 10.1038/s41598-023-36399-4.
4
Realizing asynchronous finite-time robust tracking control of switched flight vehicles by using nonfragile deep reinforcement learning.基于非脆弱深度强化学习实现切换飞行载具的异步有限时间鲁棒跟踪控制
Front Neurosci. 2023 Dec 21;17:1329576. doi: 10.3389/fnins.2023.1329576. eCollection 2023.
5
Preparing for the next pandemic: Simulation-based deep reinforcement learning to discover and test multimodal control of systemic inflammation using repurposed immunomodulatory agents.为下一次大流行做准备:基于模拟的深度强化学习,以发现和测试使用重新利用的免疫调节药物对全身炎症的多模式控制。
Front Immunol. 2022 Nov 21;13:995395. doi: 10.3389/fimmu.2022.995395. eCollection 2022.
6
Painless and accurate medical image analysis using deep reinforcement learning with task-oriented homogenized automatic pre-processing.使用面向任务的均匀自动预处理的深度强化学习进行无痛且准确的医学图像分析。
Comput Biol Med. 2023 Feb;153:106487. doi: 10.1016/j.compbiomed.2022.106487. Epub 2022 Dec 28.
7
Deep reinforcement learning for automated radiation adaptation in lung cancer.深度强化学习在肺癌放射自适应中的应用。
Med Phys. 2017 Dec;44(12):6690-6705. doi: 10.1002/mp.12625. Epub 2017 Nov 14.
8
Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy.通过知识引导的深度强化学习提高虚拟治疗计划网络的训练效率,用于放射治疗的智能自动治疗计划。
Med Phys. 2021 Apr;48(4):1909-1920. doi: 10.1002/mp.14712. Epub 2021 Feb 16.
9
Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints.基于深度强化学习的不确定约束下轨迹规划
Front Neurorobot. 2022 May 2;16:883562. doi: 10.3389/fnbot.2022.883562. eCollection 2022.
10
Efficient Deep Reinforcement Learning With Imitative Expert Priors for Autonomous Driving.基于模仿专家先验的高效深度强化学习用于自动驾驶
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7391-7403. doi: 10.1109/TNNLS.2022.3142822. Epub 2023 Oct 5.

引用本文的文献

1
Lake eutrophication prediction based on improved MIMO-DD-3Q Learning.基于改进型多输入多输出深度确定性策略梯度-3Q 学习的湖泊富营养化预测。
PLoS One. 2023 Nov 14;18(11):e0294278. doi: 10.1371/journal.pone.0294278. eCollection 2023.