Said Ghawar, Ghani Anwar, Ullah Ata, Alzahrani Abdulrahman, Azeem Muhammad, Ahmad Rashid, Kim Do-Hyeun
Department of Computer Science, International Islamic University, Islamabad, 44000, Pakistan.
Department of Computing and Technology, Iqra University, H-9 Campus, Islamabad, 44000, Pakistan.
Sci Rep. 2024 Sep 4;14(1):20595. doi: 10.1038/s41598-024-71682-y.
The Internet of Things (IoT) generates substantial data through sensors for diverse applications, such as healthcare services. This article addresses the challenge of efficiently utilizing resources in resource-scarce IoT-enabled sensors to enhance data collection, transmission, and storage. Redundant data transmission from sensors covering overlapping areas incurs additional communication and storage costs. Existing schemes, namely Asymmetric Extremum (AE) and Rapid Asymmetric Maximum (RAM), employ fixed and variable-sized windows during chunking. However, these schemes face issues while selecting the index value to decide the variable window size, which may remain zero or very low, resulting in poor deduplication. This article resolves this issue in the proposed Controlled Cut-point Identification Algorithm (CCIA), designed to restrict the variable-sized window to a certain threshold. The index value for deciding the threshold will always be larger than the half size of the fixed window. It helps to find more duplicates, but the upper limit offset is also applied to avoid the unnecessarily large-sized window, which may cause extensive computation costs. The extensive simulations are performed by deploying Windows Communication Foundation services in the Azure cloud. The results demonstrate the superiority of CCIA in various metrics, including chunk number, average chunk size, minimum and maximum chunk number, variable chunking size, and probability of failure for cut point identification. In comparison to its competitors, RAM and AE, CCIA exhibits better performance across key parameters. Specifically, CCIA outperforms in total number of chunks (6.81%, 14.17%), average number of chunks (4.39%, 18.45%), and minimum chunk size (153%, 190%). These results highlight the effectiveness of CCIA in optimizing data transmission and storage within IoT systems, showcasing its potential for improved resource utilization and reduced operational costs.
物联网(IoT)通过传感器为医疗保健服务等各种应用生成大量数据。本文探讨了在资源稀缺的物联网传感器中有效利用资源以增强数据收集、传输和存储的挑战。覆盖重叠区域的传感器进行冗余数据传输会产生额外的通信和存储成本。现有方案,即非对称极值(AE)和快速非对称最大值(RAM),在分块过程中采用固定大小和可变大小的窗口。然而,这些方案在选择索引值以确定可变窗口大小时面临问题,该索引值可能保持为零或非常低,导致去重效果不佳。本文在所提出的控制切点识别算法(CCIA)中解决了这个问题,该算法旨在将可变大小的窗口限制在一定阈值内。用于确定阈值的索引值将始终大于固定窗口大小的一半。这有助于找到更多重复项,但也应用了上限偏移以避免窗口过大,这可能会导致大量计算成本。通过在Azure云中部署Windows通信基础服务进行了广泛的模拟。结果证明了CCIA在各种指标上的优越性,包括块数、平均块大小、最小和最大块数、可变块大小以及切点识别失败的概率。与竞争对手RAM和AE相比,CCIA在关键参数上表现出更好的性能。具体而言,CCIA在总块数(6.81%,14.17%)、平均块数(4.39%,18.45%)和最小块大小(153%,190%)方面表现更优。这些结果突出了CCIA在优化物联网系统内数据传输和存储方面的有效性,展示了其在提高资源利用率和降低运营成本方面的潜力。