IoT Research Center, Pusan National University, Busan 609735, Korea.
Faculty of Information Technology, Hung Yen University of Technology and Education, Hung Yen 160000, Vietnam.
Sensors (Basel). 2022 Feb 3;22(3):1154. doi: 10.3390/s22031154.
In recent years, many methods for intrusion detection systems (IDS) have been designed and developed in the research community, which have achieved a perfect detection rate using IDS datasets. Deep neural networks (DNNs) are representative examples applied widely in IDS. However, DNN models are becoming increasingly complex in model architectures with high resource computing in hardware requirements. In addition, it is difficult for humans to obtain explanations behind the decisions made by these DNN models using large IoT-based IDS datasets. Many proposed IDS methods have not been applied in practical deployments, because of the lack of explanation given to cybersecurity experts, to support them in terms of optimizing their decisions according to the judgments of the IDS models. This paper aims to enhance the attack detection performance of IDS with big IoT-based IDS datasets as well as provide explanations of machine learning (ML) model predictions. The proposed ML-based IDS method is based on the ensemble trees approach, including decision tree (DT) and random forest (RF) classifiers which do not require high computing resources for training models. In addition, two big datasets are used for the experimental evaluation of the proposed method, NF-BoT-IoT-v2, and NF-ToN-IoT-v2 (new versions of the original BoT-IoT and ToN-IoT datasets), through the feature set of the net flow meter. In addition, the IoTDS20 dataset is used for experiments. Furthermore, the SHapley additive exPlanations (SHAP) is applied to the eXplainable AI (XAI) methodology to explain and interpret the classification decisions of DT and RF models; this is not only effective in interpreting the final decision of the ensemble tree approach but also supports cybersecurity experts in quickly optimizing and evaluating the correctness of their judgments based on the explanations of the results.
近年来,研究界设计和开发了许多入侵检测系统 (IDS) 的方法,这些方法在使用 IDS 数据集时都达到了完美的检测率。深度神经网络 (DNN) 是广泛应用于 IDS 的代表性示例。然而,DNN 模型在模型架构方面变得越来越复杂,需要硬件资源进行高资源计算。此外,对于基于大型物联网的 IDS 数据集,人类很难获得这些 DNN 模型决策背后的解释。许多提出的 IDS 方法由于缺乏对网络安全专家的解释,无法在实际部署中得到应用,这无法为他们根据 IDS 模型的判断来优化决策提供支持。本文旨在提高基于大型物联网的 IDS 数据集的攻击检测性能,并提供机器学习 (ML) 模型预测的解释。所提出的基于 ML 的 IDS 方法基于集成树方法,包括决策树 (DT) 和随机森林 (RF) 分类器,这些分类器不需要用于训练模型的高计算资源。此外,还使用了两个大型数据集对所提出方法的实验评估,NF-BoT-IoT-v2 和 NF-ToN-IoT-v2(原始 BoT-IoT 和 ToN-IoT 数据集的新版本),通过网络流量表的特征集。此外,还使用了 IoTDS20 数据集进行实验。此外,SHapley additive exPlanations (SHAP) 应用于可解释人工智能 (XAI) 方法学,以解释和解释 DT 和 RF 模型的分类决策;这不仅有效地解释了集成树方法的最终决策,还支持网络安全专家根据结果的解释快速优化和评估他们判断的正确性。