Suppr超能文献

用于安全深度强化学习方法的残余物理和后置屏蔽

Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.

作者信息

Zhang Qingang, Mahbod Muhammad Haiqal Bin, Chng Chin-Boon, Lee Poh-Seng, Chui Chee-Kong

出版信息

IEEE Trans Cybern. 2024 Feb;54(2):865-876. doi: 10.1109/TCYB.2022.3178084. Epub 2024 Jan 17.

Abstract

Deep reinforcement learning (DRL) has been researched for computer room air conditioning unit control problems in data centers (DCs). However, two main issues limit the deployment of DRL in actual systems. First, a large amount of data is needed. Next, as a mission-critical system, safe control needs to be guaranteed, and temperatures in DCs should be kept within a certain operating range. To mitigate these issues, this article proposes a novel control method RP-SDRL. First, Residual Physics, built using the first law of thermodynamics, is integrated with the DRL algorithm and a Prediction Model. Subsequently, a Correction Model adapted from gradient descent is combined with the Prediction Model as Post-Posed Shielding to enforce safe actions. The RP-SDRL method was validated using simulation. Noise is added to the states of the model to further test its performance under state uncertainty. Experimental results show that the combination of Residual Physics and DRL can significantly improve the initial policy, sample efficiency, and robustness. Residual Physics can also improve the sample efficiency and the accuracy of the prediction model. While DRL alone cannot avoid constraint violations, RP-SDRL can detect unsafe actions and significantly reduce violations. Compared to the baseline controller, about 13% of electricity usage can be saved.

摘要

深度强化学习(DRL)已被用于研究数据中心(DC)机房空调机组的控制问题。然而,有两个主要问题限制了DRL在实际系统中的应用。首先,需要大量数据。其次,作为关键任务系统,需要保证安全控制,并且数据中心的温度应保持在一定的运行范围内。为了缓解这些问题,本文提出了一种新颖的控制方法RP-SDRL。首先,利用热力学第一定律构建的残差物理模型与DRL算法和预测模型相结合。随后,将基于梯度下降改编的校正模型与预测模型相结合,作为后置屏蔽来强制执行安全动作。通过仿真对RP-SDRL方法进行了验证。向模型状态添加噪声以进一步测试其在状态不确定性下的性能。实验结果表明,残差物理模型与DRL的结合可以显著提高初始策略、样本效率和鲁棒性。残差物理模型还可以提高样本效率和预测模型的准确性。虽然单独的DRL无法避免违反约束,但RP-SDRL可以检测到不安全动作并显著减少违规情况。与基线控制器相比,可节省约13%的电力消耗。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验