Alali Mohammad, Kazeminajafabadi Armita, Imani Mahdi
Northeastern University, 360 Huntington Ave, Boston, MA, 02115, U.S.
Syst Sci Control Eng. 2024;12(1). doi: 10.1080/21642583.2024.2329260. Epub 2024 Apr 23.
Advances in technology have enabled the use of sensors with varied modalities to monitor different parts of systems, each providing diverse levels of information about the underlying system. However, resource limitations and computational power restrict the number of sensors/data that can be processed in real-time in most complex systems. These challenges necessitate the need for selecting/scheduling a subset of sensors to obtain measurements that guarantee the best monitoring objectives. This paper focuses on sensor scheduling for systems modeled by hidden Markov models. Despite the development of several sensor selection and scheduling methods, existing methods tend to be greedy and do not take into account the long-term impact of selected sensors on monitoring objectives. This paper formulates optimal sensor scheduling as a reinforcement learning problem defined over the posterior distribution of system states. Further, the paper derives a deep reinforcement learning policy for offline learning of the sensor scheduling policy, which can then be executed in real-time as new information unfolds. The proposed method applies to any monitoring objective that can be expressed in terms of the posterior distribution of the states (e.g., state estimation, information gain, etc.). The performance of the proposed method in terms of accuracy and robustness is investigated for monitoring the security of networked systems and the health monitoring of gene regulatory networks.
技术的进步使得能够使用具有不同模态的传感器来监测系统的不同部分,每个传感器都能提供关于底层系统的不同程度的信息。然而,资源限制和计算能力在大多数复杂系统中限制了能够实时处理的传感器/数据的数量。这些挑战使得有必要选择/调度传感器子集以获得能够保证最佳监测目标的测量值。本文重点关注由隐马尔可夫模型建模的系统的传感器调度。尽管已经开发了几种传感器选择和调度方法,但现有方法往往是贪婪的,没有考虑所选传感器对监测目标的长期影响。本文将最优传感器调度表述为一个基于系统状态后验分布定义的强化学习问题。此外,本文推导了一种用于传感器调度策略离线学习的深度强化学习策略,然后可以在新信息出现时实时执行该策略。所提出的方法适用于任何可以根据状态后验分布来表达的监测目标(例如,状态估计、信息增益等)。针对网络系统安全监测和基因调控网络健康监测,研究了所提出方法在准确性和鲁棒性方面的性能。