School of Artificial Intelligence, Shenyang Aerospace University, Liaoning, China.
PLoS One. 2024 Jul 24;19(7):e0307767. doi: 10.1371/journal.pone.0307767. eCollection 2024.
Due to the complex internal working process of circulating cooling water systems, most traditional control methods struggle to achieve stable and precise control. Therefore, this paper presents a novel adaptive control structure for the Twin Delayed Deep Deterministic Policy Gradient algorithm, which is based on a reference trajectory model (TD3-RTM). The structure is based on the Markov decision process of the recirculating cooling water system. Initially, the TD3 algorithm is employed to construct a deep reinforcement learning agent. Subsequently, a state space is selected, and a dense reward function is designed, considering the multivariable characteristics of the recirculating cooling water system. The agent updates its network based on different reward values obtained through interactions with the system, thereby gradually aligning the action values with the optimal policy. The TD3-RTM method introduces a reference trajectory model to accelerate the convergence speed of the agent and reduce oscillations and instability in the control system. Subsequently, simulation experiments were conducted in MATLAB/Simulink. The results show that compared to PID, fuzzy PID, DDPG and TD3, the TD3-RTM method improved the transient time in the flow loop by 6.09s, 5.29s, 0.57s, and 0.77s, respectively, and the Integral of Absolute Error(IAE) indexes decreased by 710.54, 335.1, 135.97, and 89.96, respectively, and the transient time in the temperature loop improved by 25.84s, 13.65s, 15.05s, and 0.81s, and the IAE metrics were reduced by 143.9, 59.13, 31.79, and 1.77, respectively. In addition, the overshooting of the TD3-RTM method in the flow loop was reduced by 17.64, 7.79, and 1.29 per cent, respectively, in comparison with the PID, the fuzzy PID, and the TD3.
由于循环冷却水系统内部工作过程复杂,大多数传统控制方法难以实现稳定、精确的控制。因此,本文提出了一种基于参考轨迹模型(TD3-RTM)的双延迟深度确定性策略梯度算法的自适应控制结构。该结构基于循环冷却水系统的马尔可夫决策过程。首先,采用 TD3 算法构建深度强化学习代理。然后,选择一个状态空间,并设计一个密集的奖励函数,考虑到循环冷却水系统的多变量特性。代理根据与系统交互获得的不同奖励值更新其网络,从而逐渐使动作值与最优策略保持一致。TD3-RTM 方法引入参考轨迹模型,以加快代理的收敛速度,并减少控制系统中的振荡和不稳定性。随后,在 MATLAB/Simulink 中进行了仿真实验。结果表明,与 PID、模糊 PID、DDPG 和 TD3 相比,TD3-RTM 方法分别将流量回路的瞬态时间提高了 6.09s、5.29s、0.57s 和 0.77s,积分绝对误差(IAE)指标分别降低了 710.54、335.1、135.97 和 89.96,温度回路的瞬态时间分别提高了 25.84s、13.65s、15.05s 和 0.81s,IAE 指标分别降低了 143.9、59.13、31.79 和 1.77。此外,与 PID、模糊 PID 和 TD3 相比,TD3-RTM 方法在流量回路中的超调量分别降低了 17.64%、7.79%和 1.29%。