Chen Yuheng, Niu Yingtao, Chen Changxing, Zhou Quan, Xiang Peng
Fundamentals Department, Air Force Engineering University of People's Liberation Army, Xi'an 710051, China.
The Sixty-Third Research Institute, National University of Defense Technology, Nanjing 210007, China.
Sensors (Basel). 2022 Oct 25;22(21):8159. doi: 10.3390/s22218159.
In this paper, in order to solve the problem of wireless sensor networks' reliable transmission in intelligent malicious jamming, we propose a Distributed Anti-Jamming Algorithm (DAJA) based on an actor-critic algorithm for a multi-agent system. The Multi-Agent Markov Decision Process (MAMPD) is introduced to model the progress of wireless sensor networks' anti-jamming communication, and the multi-agent system learns the intelligent jamming from the external environment by using an actor-critic algorithm. On the basis of coping with the influence of external and internal factors effectively, each sensor in networks selects the appropriate channels for transmission and finally realizes the optimal transmission of the system overall in a unit time period. In the environment of probabilistic intelligent jamming with tracking properties set in this paper, the simulation shows that the algorithm proposed can outperform the algorithm based on joint Q-learning and the conventional scheme based on orthogonal frequency hopping in terms of transmission. In addition, the proposed algorithm completes two updates of strategy evaluation and action selection in one iteration, which means that the system has higher efficiency of action selection and better adaptability to the environment through the interaction with the external environment, resulting in the better performance of transmission and convergence.
在本文中,为了解决无线传感器网络在智能恶意干扰下的可靠传输问题,我们针对多智能体系统提出了一种基于智能体-评论家算法的分布式抗干扰算法(DAJA)。引入多智能体马尔可夫决策过程(MAMPD)对无线传感器网络的抗干扰通信过程进行建模,多智能体系统通过智能体-评论家算法从外部环境中学习智能干扰。在有效应对外部和内部因素影响的基础上,网络中的每个传感器选择合适的信道进行传输,最终在单位时间段内实现系统整体的最优传输。在本文设置的具有跟踪特性的概率性智能干扰环境下,仿真结果表明,所提算法在传输方面优于基于联合Q学习的算法和基于正交跳频的传统方案。此外,所提算法在一次迭代中完成策略评估和动作选择的两次更新,这意味着系统通过与外部环境的交互具有更高的动作选择效率和更好的环境适应性,从而具有更好的传输性能和收敛性。