基于强化学习的 MISO-NOMA 系统信道预测与功率分配方案的研究。

A Study on the Impact of Integrating Reinforcement Learning for Channel Prediction and Power Allocation Scheme in MISO-NOMA System.

机构信息

Department of Electronic & Electrical Engineering, Brunel University London, Uxbridge UB8 3PH, UK.

Department of Telecommunication Engineering, Ahlia University, Manama 10878, Bahrain.

出版信息

Sensors (Basel). 2023 Jan 26;23(3):1383. doi: 10.3390/s23031383.

DOI:10.3390/s23031383

PMID:36772422

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9921540/

Abstract

In this study, the influence of adopting Reinforcement Learning (RL) to predict the channel parameters for user devices in a Power Domain Multi-Input Single-Output Non-Orthogonal Multiple Access (MISO-NOMA) system is inspected. In the channel prediction-based RL approach, the Q-learning algorithm is developed and incorporated into the NOMA system so that the developed Q-model can be employed to predict the channel coefficients for every user device. The purpose of adopting the developed Q-learning procedure is to maximize the received downlink sum-rate and decrease the estimation loss. To satisfy this aim, the developed Q-algorithm is initialized using different channel statistics and then the algorithm is updated based on the interaction with the environment in order to approximate the channel coefficients for each device. The predicted parameters are utilized at the receiver side to recover the desired data. Furthermore, based on maximizing the sum-rate of the examined user devices, the power factors for each user can be deduced analytically to allocate the optimal power factor for every user device in the system. In addition, this work inspects how the channel prediction based on the developed Q-learning model, and the power allocation policy, can both be incorporated for the purpose of multiuser recognition in the examined MISO-NOMA system. Simulation results, based on several performance metrics, have demonstrated that the developed Q-learning algorithm can be a competitive algorithm for channel estimation when compared to different benchmark schemes such as deep learning-based long short-term memory (LSTM), RL based actor-critic algorithm, RL based state-action-reward-state-action (SARSA) algorithm, and standard channel estimation scheme based on minimum mean square error procedure.

摘要

在这项研究中，考察了在功率域多输入单输出非正交多址接入（MISO-NOMA）系统中采用强化学习（RL）预测用户设备信道参数的影响。在基于信道预测的 RL 方法中，开发了 Q-learning 算法并将其纳入 NOMA 系统，以便开发的 Q 模型可以用于预测每个用户设备的信道系数。采用开发的 Q-learning 过程的目的是最大化接收的下行链路和减少估计损失。为了满足这一目标，使用不同的信道统计数据初始化开发的 Q 算法，然后根据与环境的交互更新算法，以逼近每个设备的信道系数。预测参数在接收端用于恢复所需的数据。此外，基于最大化被检查用户设备的和速率，可以解析推导出每个用户的功率因子，以便为系统中的每个用户设备分配最佳的功率因子。此外，这项工作检查了基于开发的 Q-learning 模型的信道预测和功率分配策略如何可以结合用于检查的 MISO-NOMA 系统中的多用户识别。基于几个性能指标的仿真结果表明，与深度学习基于长短期记忆（LSTM）、基于 RL 的动作-批评者算法、基于 RL 的状态-动作-奖励-状态-动作（SARSA）算法以及基于最小均方误差过程的标准信道估计方案等不同基准方案相比，开发的 Q-learning 算法可以成为一种有竞争力的信道估计算法。