de Moura José Pinheiro, Rego Patrícia Helena Moraes, da Fonseca Neto João Viana
UEMA, Brazil.
UEMA, Brazil.
ISA Trans. 2019 Jul;90:294-310. doi: 10.1016/j.isatra.2019.01.010. Epub 2019 Jan 30.
In this paper, a novel approach for online design of optimal control systems applied to the bulk resumption process by bucket wheel reclaimer (BWR) is presented. This approach is based on reinforcement learning paradigms, more specifically Action Dependent Heuristic Dynamic Programming (ADHDP), that learn online in real-time the Discrete Linear Quadratic Regulator (DLQR) optimal control solution with integral action. Due to the geometric irregularities of the storage yard stacks and variation in physical and chemical characteristics of the stacked material, the flow control of solid bulks by bucket wheel reclaimer requires methods that are suitable with the high degree of imprecision of process variables and environment uncertainties. The resumption of bulk solids is carried out by dividing the stack into layers, each layer is approximately 4 m high, and the layers are divided into workbenches up to 12 m in length. To take up a workbench several translation steps are required (penetration in the stack), with the translation step varying from 0 to 1 m. In order to maintain the desired ore flow throughout the process, the BWR lance speed must be periodically adjusted. The main advantage of the proposed control method is that besides the decision rule is fully independent of plant model, the gains of the resulting controller are self-adjustable. The control system was designed in such a way that the ADHDP-based DLQR controller with integral action would act in real-time in the plant control, using only the input and output signals and states measured along the system trajectory.
本文提出了一种应用于斗轮堆取料机(BWR)散料恢复过程的最优控制系统在线设计新方法。该方法基于强化学习范式,更具体地说是基于动作相关启发式动态规划(ADHDP),它能实时在线学习具有积分作用的离散线性二次调节器(DLQR)最优控制解。由于堆场料堆的几何不规则性以及堆存物料物理和化学特性的变化,斗轮堆取料机对固体散料的流量控制需要适合过程变量高度不精确性和环境不确定性的方法。散料的恢复是通过将料堆分层进行的,每层大约4米高,并且这些层被划分成长度达12米的工作台。为了占据一个工作台需要几个平移步骤(深入料堆),平移步长从0到1米不等。为了在整个过程中保持所需的矿石流量,必须定期调整BWR喷枪速度。所提出的控制方法的主要优点是,除了决策规则完全独立于工厂模型外,所得控制器的增益是可自我调整的。控制系统的设计方式是,基于ADHDP的具有积分作用的DLQR控制器将仅使用沿系统轨迹测量的输入、输出信号和状态在工厂控制中实时起作用。