Gouda Walaa, Tahir Sidra, Alanazi Saad, Almufareh Maram, Alwakid Ghadah
Department of Computer Engineering and Network, College of Computer and Information Sciences, Jouf University, Sakaka 72341, Al Jouf, Saudi Arabia.
Electrical Engineering Department, Faculty of Engineering at Shoubra, Benha University, Cairo 13518, Egypt.
Sensors (Basel). 2022 Sep 1;22(17):6617. doi: 10.3390/s22176617.
The Internet of Things (IoT) refers to a system of interconnected, internet-connected devices and sensors that allows the collection and dissemination of data. The data provided by these sensors may include outliers or exhibit anomalous behavior as a result of attack activities or device failure, for example. However, the majority of existing outlier detection algorithms rely on labeled data, which is frequently hard to obtain in the IoT domain. More crucially, the IoT's data volume is continually increasing, necessitating the requirement for predicting and identifying the classes of future data. In this study, we propose an unsupervised technique based on a deep Variational Auto-Encoder (VAE) to detect outliers in IoT data by leveraging the characteristic of the reconstruction ability and the low-dimensional representation of the input data's latent variables of the VAE. First, the input data are standardized. Then, we employ the VAE to find a reconstructed output representation from the low-dimensional representation of the latent variables of the input data. Finally, the reconstruction error between the original observation and the reconstructed one is used as an outlier score. Our model was trained only using normal data with no labels in an unsupervised manner and evaluated using Statlog (Landsat Satellite) dataset. The unsupervised model achieved promising and comparable results with the state-of-the-art outlier detection schemes with a precision of ≈90% and an F1 score of 79%.
物联网(IoT)指的是一个由相互连接、接入互联网的设备和传感器组成的系统,该系统允许数据的收集与传播。例如,这些传感器提供的数据可能包括异常值,或者由于攻击活动或设备故障而呈现出异常行为。然而,大多数现有的异常值检测算法依赖于有标签的数据,而在物联网领域,这些数据往往很难获取。更关键的是,物联网的数据量在持续增长,这就需要对未来数据的类别进行预测和识别。在本研究中,我们提出一种基于深度变分自编码器(VAE)的无监督技术,通过利用VAE输入数据潜在变量的重构能力和低维表示的特性来检测物联网数据中的异常值。首先,对输入数据进行标准化处理。然后,我们使用VAE从输入数据潜在变量的低维表示中找到重构的输出表示。最后,将原始观测值与重构值之间的重构误差用作异常值分数。我们的模型仅使用无标签的正常数据以无监督方式进行训练,并使用Statlog(陆地卫星)数据集进行评估。该无监督模型取得了与最先进的异常值检测方案相当且可观的结果,精确率约为90%,F1分数为79%。