Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA, 50011, USA.
Department of Civil, Construction, and Environmental Engineering, Iowa State University, Ames, IA, 50011, USA.
Water Res. 2024 Oct 1;263:122179. doi: 10.1016/j.watres.2024.122179. Epub 2024 Jul 31.
The operation of modern wastewater treatment facilities is a balancing act in which a multitude of variables are controlled to achieve a wide range of objectives, many of which are conflicting. This is especially true within secondary activated sludge systems, where significant research and industry effort has been devoted to advance control optimization strategies, both domain-driven and data-driven. Among data-driven control strategies, reinforcement learning (RL) stands out for its ability to achieve better than human performance in complex environments. While RL has been applied to activated sludge process optimization in existing literature, these applications are typically limited in scope, and never for the control of more than three actions. Expanding the scope of RL control has the potential to increase the optimization potential while concurrently reducing the number of control systems that must be tuned and maintained by operations staff. This study examined several facets of the implementation of multi-action, multi-objective RL agents, namely how many actions a single agent could successfully control and what extent of environment data was necessary to train such agents. This study observed improved control optimization with increasing action scope, though control of waste activated sludge remains a challenge. Furthermore, agents were able to maintain a high level of performance under decreased observation scope, up to a point. When compared to baseline control of the Benchmark Simulation Model No. 1 (BSM1), an RL agent controlling seven individual actions improved the average BSM1 performance metric by 8.3 %, equivalent to an annual cost savings of $40,200 after accounting for the cost of additional sensors.
现代污水处理设施的运行是一种平衡行为,需要控制多种变量以实现广泛的目标,其中许多目标是相互冲突的。这在二级活性污泥系统中尤为如此,在该系统中,已经投入了大量的研究和行业努力来推进控制优化策略,包括基于领域的和基于数据的策略。在基于数据的控制策略中,强化学习 (RL) 因其在复杂环境中能够实现优于人类的性能而脱颖而出。虽然 RL 已经在现有文献中应用于活性污泥工艺优化,但这些应用通常范围有限,并且从未控制超过三个动作。扩展 RL 控制的范围有可能增加优化潜力,同时减少操作人员必须调整和维护的控制系统数量。本研究考察了多动作、多目标 RL 代理实施的几个方面,即单个代理可以成功控制多少个动作,以及训练此类代理所需的环境数据的程度。本研究观察到随着动作范围的扩大,控制优化得到了改善,尽管剩余污泥的控制仍然是一个挑战。此外,代理在观察范围缩小的情况下仍能够保持高水平的性能,直到一定程度。与基准模拟模型 1 (BSM1) 的基准控制相比,控制七个单独动作的 RL 代理将平均 BSM1 性能指标提高了 8.3%,在考虑到额外传感器成本后,相当于每年节省 40,200 美元。