Xu Shichao, Fu Yangyang, Wang Yixuan, Yang Zhuoran, Huang Chao, O'Neill Zheng, Wang Zhaoran, Zhu Qi
Northwestern University, Mccormick School of Engineering, Evanston, 60208, USA.
Department of Mechanical Engineering, Texas a&M University, College Station, 77843, Texas, USA.
Sci Rep. 2025 Mar 5;15(1):7677. doi: 10.1038/s41598-025-91326-z.
Building heating, ventilation, and air conditioning (HVAC) systems account for nearly half of building energy consumption and [Formula: see text] of total energy consumption in the US. Their operation is also crucial for ensuring the physical and mental health of building occupants. Compared with traditional model-based HVAC control methods, the recent model-free deep reinforcement learning (DRL) based methods have shown good performance while do not require the development of detailed and costly physical models. However, these model-free DRL approaches often suffer from long training time to reach a good performance, which is a major obstacle for their practical deployment. In this work, we present a systematic approach to accelerate online reinforcement learning for HVAC control by taking full advantage of the knowledge from domain experts in various forms. Specifically, the algorithm stages include learning expert functions from existing abstract physical models and from historical data via offline reinforcement learning, integrating the expert functions with rule-based guidelines, conducting training guided by the integrated expert function and performing policy initialization from distilled expert function. Moreover, to ensure that the learned DRL-based HVAC controller can effectively keep room temperature within the comfortable range for occupants, we design a runtime shielding framework to reduce the temperature violation rate and incorporate the learned controller into it. Experimental results demonstrate up to 8.8X speedup in DRL training from our approach over previous methods, with low temperature violation rate.
建筑供暖、通风与空调(HVAC)系统占建筑能耗的近一半,在美国总能耗中占[公式:见原文]。其运行对于确保建筑 occupants 的身心健康也至关重要。与传统的基于模型的HVAC控制方法相比,最近基于无模型深度强化学习(DRL)的方法在不需要开发详细且成本高昂的物理模型的情况下表现出了良好的性能。然而,这些无模型DRL方法通常需要很长的训练时间才能达到良好的性能,这是它们实际部署的主要障碍。在这项工作中,我们提出了一种系统方法,通过充分利用来自领域专家的各种形式的知识来加速HVAC控制的在线强化学习。具体来说,算法阶段包括通过离线强化学习从现有的抽象物理模型和历史数据中学习专家函数,将专家函数与基于规则的指导方针相结合,在集成专家函数的指导下进行训练,并从提炼的专家函数进行策略初始化。此外,为了确保基于DRL的HVAC控制器能够有效地将室温保持在 occupants 的舒适范围内,我们设计了一个运行时屏蔽框架来降低温度违规率,并将学习到的控制器纳入其中。实验结果表明,我们的方法在DRL训练中比以前的方法加速了8.8倍,且温度违规率较低。