Wang Lixing, Jiao Huirong
School of Computer Science and Engineering, Northeastern University, Shenyang 110000, China.
Sensors (Basel). 2024 Dec 15;24(24):8014. doi: 10.3390/s24248014.
Natural disasters cause significant losses. Unmanned aerial vehicles (UAVs) are valuable in rescue missions but need to offload tasks to edge servers due to their limited computing power and battery life. This study proposes a task offloading decision algorithm called the multi-agent deep deterministic policy gradient with cooperation and experience replay (CER-MADDPG), which is based on multi-agent reinforcement learning for UAV computation offloading. CER-MADDPG emphasizes collaboration between UAVs and uses historical UAV experiences to classify and obtain optimal strategies. It enables collaboration among edge devices through the design of the 'critic' network. Additionally, by defining good and bad experiences for UAVs, experiences are classified into two separate buffers, allowing UAVs to learn from them, seek benefits, avoid harm, and reduce system overhead. The performance of CER-MADDPG was verified through simulations in two aspects. First, the influence of key hyperparameters on performance was examined, and the optimal values were determined. Second, CER-MADDPG was compared with other baseline algorithms. The results show that compared with MADDPG and stochastic game-based resource allocation with prioritized experience replay, CER-MADDPG achieves the lowest system overhead and superior stability and scalability.
自然灾害会造成重大损失。无人机在救援任务中很有价值,但由于其计算能力和电池寿命有限,需要将任务卸载到边缘服务器。本研究提出了一种任务卸载决策算法,称为具有协作和经验回放的多智能体深度确定性策略梯度算法(CER-MADDPG),该算法基于多智能体强化学习进行无人机计算卸载。CER-MADDPG强调无人机之间的协作,并利用无人机的历史经验进行分类并获得最优策略。它通过“评论家”网络的设计实现边缘设备之间的协作。此外,通过为无人机定义好的和坏的经验,将经验分类到两个单独的缓冲区,使无人机能够从中学习,趋利避害,并减少系统开销。通过两方面的仿真验证了CER-MADDPG的性能。首先,研究了关键超参数对性能的影响,并确定了最优值。其次,将CER-MADDPG与其他基线算法进行了比较。结果表明,与MADDPG和基于随机博弈且带有优先经验回放的资源分配算法相比,CER-MADDPG实现了最低的系统开销以及卓越的稳定性和可扩展性。