用于软件定义网络中最优负载均衡的时态深度Q学习

A Temporal Deep Q Learning for Optimal Load Balancing in Software-Defined Networks.

作者信息

Sharma Aakanksha, Balasubramanian Venki, Kamruzzaman Joarder

机构信息

Melbourne Institute of Technology (MIT), Melbourne, VIC 3000, Australia.

Institute of Innovation, Science and Sustainability, Federation University Australia, Ballarat, VIC 3350, Australia.

出版信息

Sensors (Basel). 2024 Feb 14;24(4):1216. doi: 10.3390/s24041216.

DOI:10.3390/s24041216

PMID:38400374

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10892395/

Abstract

With the rapid advancement of the Internet of Things (IoT), there is a global surge in network traffic. Software-Defined Networks (SDNs) provide a holistic network perspective, facilitating software-based traffic analysis, and are more suitable to handle dynamic loads than a traditional network. The standard SDN architecture control plane has been designed for a single controller or multiple distributed controllers; however, a logically centralized single controller faces severe bottleneck issues. Most proposed solutions in the literature are based on the static deployment of multiple controllers without the consideration of flow fluctuations and traffic bursts, which ultimately leads to a lack of load balancing among controllers in real time, resulting in increased network latency. Moreover, some methods addressing dynamic controller mapping in multi-controller SDNs consider load fluctuation and latency but face controller placement problems. Earlier, we proposed priority scheduling and congestion control algorithm (eSDN) and dynamic mapping of controllers for dynamic SDN (dSDN) to address this issue. However, the future growth of IoT is unpredictable and potentially exponential; to accommodate this futuristic trend, we need an intelligent solution to handle the complexity of growing heterogeneous devices and minimize network latency. Therefore, this paper continues our previous research and proposes temporal deep Q learning in the dSDN controller. A Temporal Deep Q learning Network (tDQN) serves as a self-learning reinforcement-based model. The agent in the tDQN learns to improve decision-making for switch-controller mapping through a reward-punish scheme, maximizing the goal of reducing network latency during the iterative learning process. Our approach-tDQN-effectively addresses dynamic flow mapping and latency optimization without increasing the number of optimally placed controllers. A multi-objective optimization problem for flow fluctuation is formulated to divert the traffic to the best-suited controller dynamically. Extensive simulation results with varied network scenarios and traffic show that the tDQN outperforms traditional networks, eSDNs, and dSDNs in terms of throughput, delay, jitter, packet delivery ratio, and packet loss.

摘要

随着物联网（IoT）的迅速发展，全球网络流量激增。软件定义网络（SDN）提供了整体的网络视角，便于基于软件的流量分析，并且比传统网络更适合处理动态负载。标准的SDN架构控制平面已设计用于单个控制器或多个分布式控制器；然而，逻辑上集中的单个控制器面临严重的瓶颈问题。文献中提出的大多数解决方案基于多个控制器的静态部署，而没有考虑流量波动和突发流量，这最终导致控制器之间缺乏实时负载平衡，从而增加网络延迟。此外，一些解决多控制器SDN中动态控制器映射的方法考虑了负载波动和延迟，但面临控制器放置问题。此前，我们提出了优先级调度和拥塞控制算法（eSDN）以及动态SDN（dSDN）的控制器动态映射来解决此问题。然而，物联网的未来增长是不可预测的，并且可能呈指数级增长；为了适应这一未来趋势，我们需要一种智能解决方案来处理日益增长的异构设备的复杂性，并将网络延迟降至最低。因此，本文延续我们之前的研究，提出在dSDN控制器中使用时态深度Q学习。时态深度Q学习网络（tDQN）作为基于强化学习的自学习模型。tDQN中的智能体通过奖励-惩罚机制学习改进交换机-控制器映射的决策，在迭代学习过程中最大化减少网络延迟的目标。我们的方法——tDQN——有效地解决了动态流映射和延迟优化问题，而无需增加最优放置的控制器数量。针对流量波动制定了多目标优化问题，以动态地将流量转移到最合适的控制器。在各种网络场景和流量下的大量仿真结果表明，tDQN在吞吐量、延迟、抖动、数据包交付率和数据包丢失方面优于传统网络、eSDN和dSDN。