移动网络中基于深度确定性策略梯度的考虑网络切片和设备到设备通信的资源分配

Deep Deterministic Policy Gradient-Based Resource Allocation Considering Network Slicing and Device-to-Device Communication in Mobile Networks.

作者信息

de Souza Lopes Hudson Henrique, Ferreira Lima Lucas Jose, de Lima Soares Telma Woerle, Teles Vieira Flávio Henrique

机构信息

Electrical, Mechanical and Computer (EMC) School of Engineering, Federal University of Goias (UFG), Goiânia 74605010, GO, Brazil.

Advanced Knowledge Center for Immersive Technologies (AKCIT), Federal University of Goias (UFG), Goiânia 74605010, GO, Brazil.

出版信息

Sensors (Basel). 2024 Sep 20;24(18):6079. doi: 10.3390/s24186079.

DOI:10.3390/s24186079

PMID:39338825

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11436155/

Abstract

Next-generation mobile networks, such as those beyond the 5th generation (B5G) and 6th generation (6G), have diverse network resource demands. Network slicing (NS) and device-to-device (D2D) communication have emerged as promising solutions for network operators. NS is a candidate technology for this scenario, where a single network infrastructure is divided into multiple (virtual) slices to meet different service requirements. Combining D2D and NS can improve spectrum utilization, providing better performance and scalability. This paper addresses the challenging problem of dynamic resource allocation with wireless network slices and D2D communications using deep reinforcement learning (DRL) techniques. More specifically, we propose an approach named DDPG-KRP based on deep deterministic policy gradient (DDPG) with K-nearest neighbors (KNNs) and reward penalization (RP) for undesirable action elimination to determine the resource allocation policy maximizing long-term rewards. The simulation results show that the DDPG-KRP is an efficient solution for resource allocation in wireless networks with slicing, outperforming other considered DRL algorithms.

摘要

下一代移动网络，如第五代以上（B5G）和第六代（6G）网络，具有多样化的网络资源需求。网络切片（NS）和设备到设备（D2D）通信已成为网络运营商颇具前景的解决方案。NS是这种场景下的一种候选技术，即单个网络基础设施被划分为多个（虚拟）切片以满足不同的服务需求。将D2D和NS相结合可以提高频谱利用率，提供更好的性能和可扩展性。本文使用深度强化学习（DRL）技术解决了无线网络切片和D2D通信中的动态资源分配这一具有挑战性的问题。更具体地说，我们提出了一种名为DDPG-KRP的方法，该方法基于深度确定性策略梯度（DDPG），结合K近邻（KNN）和奖励惩罚（RP）来消除不良行为，以确定使长期奖励最大化的资源分配策略。仿真结果表明，DDPG-KRP是无线网络切片资源分配的一种有效解决方案，优于其他考虑的DRL算法。