He Huaiwen, Yang Xiangdong, Mi Xin, Shen Hong, Liao Xuefeng
School of Computer, Zhongshan Institute, University of Electronic Science and Technology of China, Zhongshan 528400, China.
Computer Science and Engineering School, University of Electronic Science and Technology of China, Chengdu 611731, China.
Sensors (Basel). 2024 Aug 8;24(16):5141. doi: 10.3390/s24165141.
Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs and the idle MDs in a D2D-MEC (mobile edge computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D-MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently explored in the existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and the coupling of actions across time slots, we model the problem as a Markov decision process (MDP) and perform multi-agent DRL through multi-agent proximal policy optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison to the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%. Our proposed algorithm is particularly pertinent to sensor networks, where mobile devices equipped with sensors generate a substantial volume of data that requires timely processing to ensure quality of experience (QoE) and meet the service-level agreements (SLAs) of delay-sensitive applications.
设备到设备(D2D)是下一代通信中的一项关键技术,它允许移动设备(MD)之间直接进行任务卸载,以提高空闲资源的有效利用率。本文提出了一种新颖的算法,用于在D2D移动边缘计算(MEC)系统中,通过部署多智能体深度强化学习(DRL),在截止期限约束下,最小化对延迟敏感任务的长期平均延迟,实现活跃MD和空闲MD之间的动态任务卸载。我们的核心创新是D2D-MEC系统中针对空闲和活跃设备的动态分区方案,该方案考虑了随机任务到达和多时隙任务执行情况,而现有文献对此研究不足。我们采用基于队列的系统来构建动态任务卸载优化问题。为应对大动作空间以及跨时隙动作耦合的挑战,我们将该问题建模为马尔可夫决策过程(MDP),并通过多智能体近端策略优化(MAPPO)进行多智能体DRL。我们采用集中训练分散执行(CTDE)框架,使每个MD能够仅基于其本地系统状态做出卸载决策。大量仿真证明了我们算法的效率和快速收敛性。与现有部署单智能体DRL的次优结果相比,我们的算法将平均任务完成延迟降低了11.0%,将丢弃任务的比例降低了17.0%。我们提出的算法特别适用于传感器网络,在该网络中,配备传感器的移动设备会生成大量数据,需要及时处理以确保体验质量(QoE)并满足对延迟敏感应用的服务水平协议(SLA)。