Melo Mário, Aquino Gibeon
Academic Department, Federal Institute of Rio Grande do Norte, Lajes 59535-000, Brazil.
Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil.
Sensors (Basel). 2021 Oct 29;21(21):7181. doi: 10.3390/s21217181.
Fault tolerance in IoT systems is challenging to overcome due to its complexity, dynamicity, and heterogeneity. IoT systems are typically designed and constructed in layers. Every layer has its requirements and fault tolerance strategies. However, errors in one layer can propagate and cause effects on others. Thus, it is impractical to consider a centralized fault tolerance approach for an entire system. Consequently, it is vital to consider multiple layers in order to enable collaboration and information exchange when addressing fault tolerance. The purpose of this study is to propose a multi-layer fault tolerance approach, granting interconnection among IoT system layers, allowing information exchange and collaboration in order to attain the property of dependability. Therefore, we define an event-driven framework called FaTEMa (Fault Tolerance Event Manager) that creates a dedicated fault-related communication channel in order to propagate events across the levels of the system. The implemented framework assist with error detection and continued service. Additionally, it offers extension points to support heterogeneous communication protocols and evolve new capabilities. Our empirical results show that introducing FaTEMa provided improvements to the error detection and error resolution time, consequently improving system availability. In addition, the use of Fatema provided a reliability improvement and a reduction in the number of failures produced.
由于物联网系统的复杂性、动态性和异构性,其容错性很难克服。物联网系统通常是分层设计和构建的。每一层都有其要求和容错策略。然而,一层中的错误可能会传播并对其他层产生影响。因此,对整个系统采用集中式容错方法是不切实际的。因此,在解决容错问题时,考虑多个层次以实现协作和信息交换至关重要。本研究的目的是提出一种多层容错方法,实现物联网系统各层之间的互连,允许信息交换和协作,以实现可靠性。因此,我们定义了一个名为FaTEMa(容错事件管理器)的事件驱动框架,该框架创建一个专用的故障相关通信通道,以便在系统的各个级别传播事件。所实现的框架有助于错误检测和持续服务。此外,它还提供扩展点,以支持异构通信协议并发展新功能。我们的实证结果表明,引入FaTEMa改进了错误检测和错误解决时间,从而提高了系统可用性。此外,使用Fatema提高了可靠性,并减少了产生的故障数量。