Yuan Yaxiong, Lei Lei, Vu Thang X, Chatzinotas Symeon, Sun Sumei, Ottersten Björn
Interdisciplinary Center for Security, Reliability and Trust, University of Luxembourg, 1855 Kirchberg, Luxembourg, Luxembourg.
Institute for Infocomm Research, Agency for Science, Technology, and Research, Singapore , 138632 Singapore.
EURASIP J Wirel Commun Netw. 2021;2021(1):78. doi: 10.1186/s13638-021-01960-0. Epub 2021 Apr 7.
In unmanned aerial vehicle (UAV)-assisted networks, UAV acts as an aerial base station which acquires the requested data via backhaul link and then serves ground users (GUs) through an access network. In this paper, we investigate an energy minimization problem with a limited power supply for both backhaul and access links. The difficulties for solving such a non-convex and combinatorial problem lie at the high computational complexity/time. In solution development, we consider the approaches from both actor-critic deep reinforcement learning (AC-DRL) and optimization perspectives. First, two offline non-learning algorithms, i.e., an optimal and a heuristic algorithms, based on piecewise linear approximation and relaxation are developed as benchmarks. Second, toward real-time decision-making, we improve the conventional AC-DRL and propose two learning schemes: AC-based user group scheduling and backhaul power allocation (ACGP), and joint AC-based user group scheduling and optimization-based backhaul power allocation (ACGOP). Numerical results show that the computation time of both ACGP and ACGOP is reduced tenfold to hundredfold compared to the offline approaches, and ACGOP is better than ACGP in energy savings. The results also verify the superiority of proposed learning solutions in terms of guaranteeing the feasibility and minimizing the system energy compared to the conventional AC-DRL.
在无人机(UAV)辅助网络中,无人机充当空中基站,通过回程链路获取所需数据,然后通过接入网络为地面用户(GU)提供服务。在本文中,我们研究了一种针对回程链路和接入链路电源有限的能量最小化问题。解决此类非凸组合问题的困难在于高计算复杂度/时间。在解决方案开发中,我们从演员-评论家深度强化学习(AC-DRL)和优化两个角度考虑方法。首先,开发了两种基于离线非学习的算法,即基于分段线性近似和松弛的最优算法和启发式算法作为基准。其次,针对实时决策,我们改进了传统的AC-DRL并提出了两种学习方案:基于AC的用户组调度和回程功率分配(ACGP),以及基于AC的用户组联合调度和基于优化的回程功率分配(ACGOP)。数值结果表明,与离线方法相比,ACGP和ACGOP的计算时间减少了10倍至100倍,并且ACGOP在节能方面优于ACGP。结果还验证了所提出的学习解决方案在保证可行性和最小化系统能量方面优于传统AC-DRL的优势。