基于深度强化学习的无线组播系统调度与功率控制

Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning.

作者信息

Raghu Ramkumar, Panju Mahadesh, Aggarwal Vaneet, Sharma Vinod

机构信息

Indian Institute of Science, Karnataka 560012, India.

School of Industrial Engineering and School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA.

出版信息

Entropy (Basel). 2021 Nov 23;23(12):1555. doi: 10.3390/e23121555.

DOI:10.3390/e23121555

PMID:34945861

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8700614/

Abstract

Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a content centric network. Power control and optimal scheduling can significantly improve the wireless multicast network's performance under fading. However, the model-based approaches for power control and scheduling studied earlier are not scalable to large state spaces or changing system dynamics. In this paper, we use deep reinforcement learning, where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learned for reasonably large systems via this approach. Further, we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-time scale approach to simultaneously learn the optimal queuing strategy along with power control. We demonstrate the scalability, tracking and cross-layer optimization capabilities of our algorithms via simulations. The proposed multi-time scale approach can be used in general large state-space dynamical systems with multiple objectives and constraints, and may be of independent interest.

摘要

无线系统中的多播是在以内容为中心的网络中利用用户请求冗余的自然方式。功率控制和最优调度可以显著提高衰落条件下无线多播网络的性能。然而，早期研究的基于模型的功率控制和调度方法对于大状态空间或变化的系统动态不可扩展。在本文中，我们使用深度强化学习，通过深度神经网络对Q函数进行函数逼近，以获得与小型网络最优策略相匹配的功率控制策略。我们表明，通过这种方法可以为相当大的系统学习功率控制策略。此外，我们使用多时间尺度随机优化来维持平均功率约束。我们证明，对学习算法进行轻微修改可以跟踪时变系统统计信息。最后，我们扩展多时间尺度方法以同时学习最优排队策略和功率控制。我们通过仿真展示了我们算法的可扩展性、跟踪能力和跨层优化能力。所提出的多时间尺度方法可用于具有多个目标和约束的一般大状态空间动态系统，可能具有独立的研究价值。