用于安全深度强化学习方法的残余物理和后置屏蔽

Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.

作者信息

Zhang Qingang, Mahbod Muhammad Haiqal Bin, Chng Chin-Boon, Lee Poh-Seng, Chui Chee-Kong

出版信息

IEEE Trans Cybern. 2024 Feb;54(2):865-876. doi: 10.1109/TCYB.2022.3178084. Epub 2024 Jan 17.

DOI:10.1109/TCYB.2022.3178084

Abstract

Deep reinforcement learning (DRL) has been researched for computer room air conditioning unit control problems in data centers (DCs). However, two main issues limit the deployment of DRL in actual systems. First, a large amount of data is needed. Next, as a mission-critical system, safe control needs to be guaranteed, and temperatures in DCs should be kept within a certain operating range. To mitigate these issues, this article proposes a novel control method RP-SDRL. First, Residual Physics, built using the first law of thermodynamics, is integrated with the DRL algorithm and a Prediction Model. Subsequently, a Correction Model adapted from gradient descent is combined with the Prediction Model as Post-Posed Shielding to enforce safe actions. The RP-SDRL method was validated using simulation. Noise is added to the states of the model to further test its performance under state uncertainty. Experimental results show that the combination of Residual Physics and DRL can significantly improve the initial policy, sample efficiency, and robustness. Residual Physics can also improve the sample efficiency and the accuracy of the prediction model. While DRL alone cannot avoid constraint violations, RP-SDRL can detect unsafe actions and significantly reduce violations. Compared to the baseline controller, about 13% of electricity usage can be saved.

摘要

深度强化学习（DRL）已被用于研究数据中心（DC）机房空调机组的控制问题。然而，有两个主要问题限制了DRL在实际系统中的应用。首先，需要大量数据。其次，作为关键任务系统，需要保证安全控制，并且数据中心的温度应保持在一定的运行范围内。为了缓解这些问题，本文提出了一种新颖的控制方法RP-SDRL。首先，利用热力学第一定律构建的残差物理模型与DRL算法和预测模型相结合。随后，将基于梯度下降改编的校正模型与预测模型相结合，作为后置屏蔽来强制执行安全动作。通过仿真对RP-SDRL方法进行了验证。向模型状态添加噪声以进一步测试其在状态不确定性下的性能。实验结果表明，残差物理模型与DRL的结合可以显著提高初始策略、样本效率和鲁棒性。残差物理模型还可以提高样本效率和预测模型的准确性。虽然单独的DRL无法避免违反约束，但RP-SDRL可以检测到不安全动作并显著减少违规情况。与基线控制器相比，可节省约13%的电力消耗。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于安全深度强化学习方法的残余物理和后置屏蔽

Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.

作者信息

出版信息

相似文献

引用本文的文献

用于安全深度强化学习方法的残余物理和后置屏蔽

Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.

作者信息

出版信息

相似文献

引用本文的文献