基于深度强化学习的具有多样化QoS保障的卫星物联网资源分配

Deep Reinforcement Learning-Based Resource Allocation for Satellite Internet of Things with Diverse QoS Guarantee.

作者信息

Tang Siqi, Pan Zhisong, Hu Guyu, Wu Yang, Li Yunbo

机构信息

Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China.

Beijing Information and Communications Technology Research Center, Beijing 100036, China.

出版信息

Sensors (Basel). 2022 Apr 13;22(8):2979. doi: 10.3390/s22082979.

DOI:10.3390/s22082979

PMID:35458964

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9024869/

Abstract

Large-scale terminals' various QoS requirements are key challenges confronting the resource allocation of Satellite Internet of Things (S-IoT). This paper presents a deep reinforcement learning-based online channel allocation and power control algorithm in an S-IoT uplink scenario. The intelligent agent determines the transmission channel and power simultaneously based on contextual information. Furthermore, the weighted normalized reward concerning success rate, power efficiency, and QoS requirement is adopted to balance the performance between increasing resource efficiency and meeting QoS requirements. Finally, a practical deployment mechanism based on transfer learning is proposed to promote onboard training efficiency and to reduce computation consumption of the training process. The simulation demonstrates that the proposed method can balance the success rate and power efficiency with QoS requirement guaranteed. For S-IoT's normal operation condition, the proposed method can improve the power efficiency by 60.91% and 144.44% compared with GA and DRL_RA, while its power efficiency is only 4.55% lower than that of DRL-EERA. In addition, this method can be transferred and deployed to a space environment by merely 100 onboard training steps.

摘要

大规模终端的各种QoS需求是卫星物联网（S-IoT）资源分配面临的关键挑战。本文提出了一种基于深度强化学习的S-IoT上行链路场景在线信道分配和功率控制算法。智能代理根据上下文信息同时确定传输信道和功率。此外，采用了关于成功率、功率效率和QoS需求的加权归一化奖励，以平衡提高资源效率和满足QoS需求之间的性能。最后，提出了一种基于迁移学习的实际部署机制，以提高机载训练效率并减少训练过程的计算消耗。仿真表明，所提方法能够在保证QoS需求的情况下平衡成功率和功率效率。对于S-IoT的正常运行条件，与遗传算法（GA）和DRL_RA相比，所提方法的功率效率可提高60.91%和144.44%，而其功率效率仅比DRL-EERA低4.55%。此外，该方法仅需100步机载训练即可迁移并部署到空间环境中。