Information and Communication Engineering Department, Dongguk University, Seoul 04620, Korea.
Sensors (Basel). 2021 Oct 25;21(21):7053. doi: 10.3390/s21217053.
With the increase in Internet of Things (IoT) devices and network communications, but with less bandwidth growth, the resulting constraints must be overcome. Due to the network complexity and uncertainty of emergency distribution parameters in smart environments, using predetermined rules seems illogical. Reinforcement learning (RL), as a powerful machine learning approach, can handle such smart environments without a trainer or supervisor. Recently, we worked on bandwidth management in a smart environment with several fog fragments using limited shared bandwidth, where IoT devices may experience uncertain emergencies in terms of the time and sequence needed for more bandwidth for further higher-level communication. We introduced fog fragment cooperation using an RL approach under a predefined fixed threshold constraint. In this study, we promote this approach by removing the fixed level of restriction of the threshold through hierarchical reinforcement learning (HRL) and completing the cooperation qualification. At the first learning hierarchy level of the proposed approach, the best threshold level is learned over time, and the final results are used by the second learning hierarchy level, where the fog node learns the best device for helping an emergency device by temporarily lending the bandwidth. Although equipping the method to the adaptive threshold and restricting fog fragment cooperation make the learning procedure more difficult, the HRL approach increases the method's efficiency in terms of time and performance.
随着物联网 (IoT) 设备和网络通信的增加,而带宽的增长却较少,因此必须克服由此产生的限制。由于智能环境中网络的复杂性和应急分配参数的不确定性,使用预定规则似乎不合逻辑。强化学习 (RL) 作为一种强大的机器学习方法,可以在没有培训师或监督者的情况下处理这种智能环境。最近,我们使用有限的共享带宽研究了具有多个雾碎片的智能环境中的带宽管理,其中 IoT 设备可能会遇到不确定的紧急情况,需要更多带宽以进行进一步的更高层次的通信。我们使用 RL 方法在预定义的固定阈值约束下引入了雾碎片合作。在这项研究中,我们通过层次强化学习 (HRL) 去除了阈值的固定限制水平并完成了合作资格,从而推动了这种方法。在提出的方法的第一个学习层次级别中,最佳阈值水平会随着时间的推移而学习,最终结果将由第二个学习层次级别使用,其中雾节点通过临时借用带宽来学习帮助应急设备的最佳设备。尽管将方法配备自适应阈值并限制雾碎片合作会使学习过程更加困难,但 HRL 方法在时间和性能方面提高了该方法的效率。