Sun Ming, Jin Yanhui, Wang Shumei, Mei Erzhuang
College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China.
School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China.
Entropy (Basel). 2022 Nov 24;24(12):1722. doi: 10.3390/e24121722.
Device-to-device (D2D) technology enables direct communication between devices, which can effectively solve the problem of insufficient spectrum resources in 5G communication technology. Since the channels are shared among multiple D2D user pairs, it may lead to serious interference between D2D user pairs. In order to reduce interference, effectively increase network capacity, and improve wireless spectrum utilization, this paper proposed a distributed resource allocation algorithm with the joint of a deep Q network (DQN) and an unsupervised learning network. Firstly, a DQN algorithm was constructed to solve the channel allocation in the dynamic and unknown environment in a distributed manner. Then, a deep power control neural network with the unsupervised learning strategy was constructed to output an optimized channel power control scheme to maximize the spectrum transmit sum-rate through the corresponding constraint processing. As opposed to traditional centralized approaches that require the collection of instantaneous global network information, the algorithm proposed in this paper used each transmitter as a learning agent to make channel selection and power control through a small amount of state information collected locally. The simulation results showed that the proposed algorithm was more effective in increasing the convergence speed and maximizing the transmit sum-rate than other traditional centralized and distributed algorithms.
设备到设备(D2D)技术实现了设备之间的直接通信,这可以有效解决5G通信技术中频谱资源不足的问题。由于信道在多个D2D用户对之间共享,这可能导致D2D用户对之间产生严重干扰。为了减少干扰、有效增加网络容量并提高无线频谱利用率,本文提出了一种结合深度Q网络(DQN)和无监督学习网络的分布式资源分配算法。首先,构建了一种DQN算法,以分布式方式解决动态和未知环境中的信道分配问题。然后,构建了一种具有无监督学习策略的深度功率控制神经网络,通过相应的约束处理输出优化的信道功率控制方案,以最大化频谱传输总速率。与需要收集瞬时全局网络信息的传统集中式方法不同,本文提出的算法将每个发射机作为学习代理,通过本地收集的少量状态信息进行信道选择和功率控制。仿真结果表明,与其他传统的集中式和分布式算法相比,该算法在提高收敛速度和最大化传输总速率方面更有效。