基于深度Q学习的具有频谱共享功能的高空平台站传输功率控制

Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing.

作者信息

Jo Seongjun, Yang Wooyeol, Choi Haing Kun, Noh Eonsu, Jo Han-Shin, Park Jaedon

机构信息

Department of Electronic Engineering, Hanbat National University, Daejeon 34158, Korea.

TnB Radio Tech., Seoul 08504, Korea.

出版信息

Sensors (Basel). 2022 Feb 19;22(4):1630. doi: 10.3390/s22041630.

DOI:10.3390/s22041630

PMID:35214535

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8878605/

Abstract

A High Altitude Platform Station (HAPS) can facilitate high-speed data communication over wide areas using high-power line-of-sight communication; however, it can significantly interfere with existing systems. Given spectrum sharing with existing systems, the HAPS transmission power must be adjusted to satisfy the interference requirement for incumbent protection. However, excessive transmission power reduction can lead to severe degradation of the HAPS coverage. To solve this problem, we propose a multi-agent Deep Q-learning (DQL)-based transmission power control algorithm to minimize the outage probability of the HAPS downlink while satisfying the interference requirement of an interfered system. In addition, a double DQL (DDQL) is developed to prevent the potential risk of action-value overestimation from the DQL. With a proper state, reward, and training process, all agents cooperatively learn a power control policy for achieving a near-optimal solution. The proposed DQL power control algorithm performs equal or close to the optimal exhaustive search algorithm for varying positions of the interfered system. The proposed DQL and DDQL power control yields the same performance, which indicates that the actional value overestimation does not adversely affect the quality of the learned policy.

摘要

高空平台站（HAPS）可以利用高功率视距通信在广阔区域内促进高速数据通信；然而，它会对现有系统造成严重干扰。考虑到与现有系统的频谱共享，必须调整HAPS的发射功率，以满足对现有系统保护的干扰要求。然而，过度降低发射功率会导致HAPS覆盖范围严重下降。为了解决这个问题，我们提出了一种基于多智能体深度Q学习（DQL）的发射功率控制算法，以在满足受干扰系统干扰要求的同时，最小化HAPS下行链路的中断概率。此外，还开发了一种双深度Q学习（DDQL）算法，以防止DQL中动作值高估的潜在风险。通过适当的状态、奖励和训练过程，所有智能体协同学习一种功率控制策略，以实现接近最优的解决方案。对于受干扰系统的不同位置，所提出的DQL功率控制算法的性能与最优穷举搜索算法相当或接近。所提出的DQL和DDQL功率控制具有相同的性能，这表明动作值高估不会对学习到的策略质量产生不利影响。