Qureshi Khalid Ibrahim, Lu Bingxian, Lu Cheng, Lodhi Muhammad Ali, Wang Lei
Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian 116024, China.
Sensors (Basel). 2024 Oct 10;24(20):6535. doi: 10.3390/s24206535.
In this paper, we present a novel method to enhance the sum-rate effectiveness in full-duplex unmanned aerial vehicle (UAV)-assisted communication networks. Existing approaches often couple uplink and downlink associations, resulting in suboptimal performance, particularly in dynamic environments where user demands and network conditions are unpredictable. To overcome these limitations, we propose a decoupling of uplink and downlink associations for ground-based users (GBUs), significantly improving network efficiency. We formulate a comprehensive optimization problem that integrates UAV trajectory design and user association, aiming to maximize the overall sum-rate efficiency of the network. Due to the problem's non-convexity, we reformulate it as a Partially Observable Markov Decision Process (POMDP), enabling UAVs to make real-time decisions based on local observations without requiring complete global information. Our framework employs multi-agent deep reinforcement learning (MADRL), specifically the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, which balances centralized training with distributed execution. This allows UAVs to efficiently learn optimal user associations and trajectory controls while dynamically adapting to local conditions. The proposed solution is particularly suited for critical applications such as disaster response and search and rescue missions, highlighting the practical significance of utilizing UAVs for rapid network deployment in emergencies. By addressing the limitations of existing centralized and distributed solutions, our hybrid model combines the benefits of centralized training with the adaptability of distributed inference, ensuring optimal UAV operations in real-time scenarios.
在本文中,我们提出了一种新颖的方法,以提高全双工无人机辅助通信网络中的和速率效率。现有方法通常将上行链路和下行链路关联耦合在一起,导致性能次优,尤其是在用户需求和网络状况不可预测的动态环境中。为克服这些限制,我们针对地面用户(GBU)提出了上行链路和下行链路关联的解耦方法,显著提高了网络效率。我们制定了一个综合优化问题,该问题整合了无人机轨迹设计和用户关联,旨在最大化网络的整体和速率效率。由于该问题的非凸性,我们将其重新表述为部分可观测马尔可夫决策过程(POMDP),使无人机能够基于局部观测做出实时决策,而无需完整的全局信息。我们的框架采用多智能体深度强化学习(MADRL),具体为多智能体深度确定性策略梯度(MADDPG)算法,该算法在集中式训练和分布式执行之间取得平衡。这使得无人机能够在动态适应局部条件的同时,高效地学习最优用户关联和轨迹控制。所提出的解决方案特别适用于灾难响应和搜索救援任务等关键应用,突出了在紧急情况下利用无人机进行快速网络部署的实际意义。通过解决现有集中式和分布式解决方案的局限性,我们的混合模型结合了集中式训练的优势和分布式推理的适应性,确保无人机在实时场景中实现最优操作。