Manzano Sanchez Ricardo Alejandro, Zaman Marzia, Goel Nishith, Naik Kshirasagar, Joshi Rohit
Cistech Limited, 201-203 Colonnade Rd, Nepean, ON K2E 7K3, Canada.
Cistel Technology Inc., 30 Concourse Gate, Nepean, ON K2E 7V7, Canada.
Sensors (Basel). 2022 Oct 12;22(20):7726. doi: 10.3390/s22207726.
In recent years, anomaly detection and machine learning for intrusion detection systems have been used to detect anomalies on Internet of Things networks. These systems rely on machine and deep learning to improve the detection accuracy. However, the robustness of the model depends on the number of datasamples available, quality of the data, and the distribution of the data classes. In the present paper, we focused specifically on the amount of data and class imbalanced since both parameters are key in IoT due to the fact that network traffic is increasing exponentially. For this reason, we propose a framework that uses a big data methodology with Hadoop-Spark to train and test multi-class and binary classification with one-vs-rest strategy for intrusion detection using the entire BoT IoT dataset. Thus, we evaluate all the algorithms available in Hadoop-Spark in terms of accuracy and processing time. In addition, since the BoT IoT dataset used is highly imbalanced, we also improve the accuracy for detecting minority classes by generating more datasamples using a Conditional Tabular Generative Adversarial Network (CTGAN). In general, our proposed model outperforms other published models including our previous model. Using our proposed methodology, the F1-score of one of the minority class, i.e., Theft attack was improved from 42% to 99%.
近年来,入侵检测系统中的异常检测和机器学习已被用于检测物联网网络上的异常情况。这些系统依靠机器学习和深度学习来提高检测精度。然而,模型的稳健性取决于可用数据样本的数量、数据质量以及数据类别的分布。在本文中,我们特别关注数据量和类不平衡问题,因为由于网络流量呈指数级增长,这两个参数在物联网中至关重要。因此,我们提出了一个框架,该框架使用Hadoop-Spark的大数据方法,采用一对多策略对整个物联网僵尸网络数据集进行多类和二分类的训练和测试,以用于入侵检测。因此,我们从准确性和处理时间方面评估了Hadoop-Spark中所有可用的算法。此外,由于所使用的物联网僵尸网络数据集高度不平衡,我们还通过使用条件表格生成对抗网络(CTGAN)生成更多数据样本,提高了检测少数类别的准确性。总体而言,我们提出的模型优于其他已发表的模型,包括我们之前的模型。使用我们提出的方法,少数类之一即盗窃攻击的F1分数从42%提高到了99%。