Zhou Zhanyang, Niu Yingtao, Wan Boyu, Zhou Wenhao
Sixty-Third Research Institute, National University of Defense Technology, Nanjing 210007, China.
Fundamentals Department, Air Force Engineering University of PLA, Xi'an 710051, China.
Entropy (Basel). 2023 Nov 16;25(11):1547. doi: 10.3390/e25111547.
The communication reliability of wireless communication systems is threatened by malicious jammers. Aiming at the problem of reliable communication under malicious jamming, a large number of schemes have been proposed to mitigate the effects of malicious jamming by avoiding the blocking interference of jammers. However, the existing anti-jamming schemes, such as fixed strategy, Reinforcement learning (RL), and deep Q network (DQN) have limited use of historical data, and most of them only pay attention to the current state changes and cannot gain experience from historical samples. In view of this, this manuscript proposes anti-jamming communication using imitation learning. Specifically, this manuscript addresses the problem of anti-jamming decisions for wireless communication in scenarios with malicious jamming and proposes an algorithm that consists of three steps: First, the heuristic-based Expert Trajectory Generation Algorithm is proposed as the expert strategy, which enables us to obtain the expert trajectory from historical samples. The trajectory mentioned in this algorithm represents the sequence of actions undertaken by the expert in various situations. Then obtaining a user strategy by imitating the expert strategy using an imitation learning neural network. Finally, adopting a functional user strategy for efficient and sequential anti-jamming decisions. Simulation results indicate that the proposed method outperforms the RL-based anti-jamming method and DQN-based anti-jamming method regarding solving continuous-state spectrum anti-jamming problems without causing "curse of dimensionality" and providing greater robustness against channel fading and noise as well as when the jamming pattern changes.
无线通信系统的通信可靠性受到恶意干扰器的威胁。针对恶意干扰下的可靠通信问题,人们提出了大量方案,通过避开干扰器的阻塞干扰来减轻恶意干扰的影响。然而,现有的抗干扰方案,如固定策略、强化学习(RL)和深度Q网络(DQN),对历史数据的利用有限,它们大多只关注当前状态变化,无法从历史样本中获取经验。鉴于此,本文提出了基于模仿学习的抗干扰通信方法。具体而言,本文解决了恶意干扰场景下无线通信的抗干扰决策问题,并提出了一种由三个步骤组成的算法:首先,提出基于启发式的专家轨迹生成算法作为专家策略,使我们能够从历史样本中获取专家轨迹。该算法中提到的轨迹代表专家在各种情况下采取的行动序列。然后,使用模仿学习神经网络通过模仿专家策略来获得用户策略。最后,采用有效的用户策略进行高效且连续的抗干扰决策。仿真结果表明,在解决连续状态频谱抗干扰问题时,所提方法在不引起“维度灾难”的情况下,在应对信道衰落和噪声以及干扰模式变化时,比基于RL的抗干扰方法和基于DQN的抗干扰方法具有更强的鲁棒性,性能更优。