Yu Yao-Chi, Narayanan Vignesh, Li Jr-Shin
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12653-12664. doi: 10.1109/TNNLS.2023.3264151. Epub 2024 Sep 4.
Problems involving controlling the collective behavior of a population of structurally similar dynamical systems, the so-called ensemble control, arise in diverse emerging applications and pose a grand challenge in systems science and control engineering. Owing to the severely under-actuated nature and the difficulty of placing large-scale sensor networks, ensemble systems are limited to being actuated and monitored at the population level. Moreover, mathematical models describing the dynamics of ensemble systems are often elusive. Therefore, it is essential to design broadcast controls that excite the entire population in such a way that the heterogeneity in system dynamics is robustly compensated. In this article, we propose a reinforcement learning (RL)-based data-driven control framework incorporating population-level aggregated measurement data to learn a global control signal for steering a dynamic population in the desired manner. In particular, we introduce the notion of ensemble moments induced by aggregated measurements and derive the associated moment system to the original ensemble system. Then, using the moment system, we learn an approximation of optimal value functions and the associated policies in terms of ensemble moments through RL. We illustrate the feasibility and scalability of the proposed moment-based approach via numerical experiments using a population of linear, bilinear, and nonlinear dynamic ensemble systems. We report that the proposed method achieves the desired control objectives of various ensemble control tasks and obtains significantly better averaged-reward when compared with three existing methods.
涉及控制结构相似的动态系统群体的集体行为问题,即所谓的总体控制,出现在各种新兴应用中,并在系统科学和控制工程中构成了巨大挑战。由于严重欠驱动的特性以及部署大规模传感器网络的困难,总体系统仅限于在群体层面进行驱动和监测。此外,描述总体系统动态的数学模型往往难以捉摸。因此,设计广播控制以激发整个群体,从而稳健地补偿系统动态中的异质性至关重要。在本文中,我们提出了一种基于强化学习(RL)的数据驱动控制框架,该框架结合了群体层面的聚合测量数据,以学习用于以期望方式引导动态群体的全局控制信号。具体而言,我们引入了由聚合测量引起的总体矩的概念,并推导了与原始总体系统相关的矩系统。然后,使用矩系统,我们通过强化学习学习最优值函数及其在总体矩方面的相关策略的近似值。我们通过使用线性、双线性和非线性动态总体系统群体的数值实验,说明了所提出的基于矩的方法的可行性和可扩展性。我们报告说,与三种现有方法相比,所提出的方法实现了各种总体控制任务的期望控制目标,并获得了显著更好的平均奖励。