Binbusayyis Adel, Vaiyapuri Thavavel
College of Computer Science and Engineering, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia.
Heliyon. 2020 Jul 9;6(7):e04262. doi: 10.1016/j.heliyon.2020.e04262. eCollection 2020 Jul.
The revolutionary advances in network technologies have spearheaded the design of advanced cyberattacks to surpass traditional security defense with dreadful consequences. Recently, Intrusion Detection System (IDS) is considered as a pivotal element in network security infrastructures to achieve solid line of protection against cyberattacks. The prime challenges presented to IDS are curse of high dimensionality and class imbalance that tends to increase the detection time and degrade the efficiency of IDS. As a result, feature selection plays an important role in enabling to identify the most significant features for intrusion detection. Although, several feature evaluation measures are being proposed for feature selection in literature, there is no consensus on which measures are best for intrusion detection. Therein, this work aims at recommending the most appropriate feature evaluation measure for building an efficient IDS. In this direction, four filter-based feature evaluation measures that stem from different theories such as Consistency, Correlation, Information and Distance are investigated for their potential implications in enhancing the detection ability of IDS model for different classes of attacks. Along with this, the influence of the selected features on classification accuracy of an IDS model is analyzed using four different categories of classifiers namely, K-nearest neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM) and Deep Belief Network (DBN). Finally, a two-step statistical significance test is conducted on the experimental results to determine which feature evaluation measure contributes statistically significant difference in IDS performance. All the experimental comparisons are performed on two benchmark intrusion detection datasets, NSL-KDD and UNSW-NB15. In these experiments, consistency measure has best influenced the IDS model in improving the detection ability with regard to detection rate (DR), false alarm rate (FAR), kappa statistics (KS) and identifying the most significant features for intrusion detection. Also, from the analysis results, it is revealed that RF is the ideal classifier to be used in conjunction with any of these four feature evaluation measures to achieve better detection accuracy than others. From the statistical results, we recommend the use of consistency measure for designing an efficient IDS in terms of DR and FAR.
网络技术的革命性进展引领了先进网络攻击的设计,这些攻击超越了传统安全防御,带来了可怕的后果。最近,入侵检测系统(IDS)被视为网络安全基础设施中的关键要素,以实现针对网络攻击的坚实防线。IDS面临的主要挑战是高维度诅咒和类不平衡,这往往会增加检测时间并降低IDS的效率。因此,特征选择在识别用于入侵检测的最重要特征方面起着重要作用。尽管文献中提出了几种用于特征选择的特征评估方法,但对于哪种方法最适合入侵检测尚无共识。在此,这项工作旨在推荐用于构建高效IDS的最合适特征评估方法。在这个方向上,研究了四种基于过滤器的特征评估方法,它们源于不同的理论,如一致性、相关性、信息和距离,以探讨它们对提高IDS模型针对不同类型攻击的检测能力的潜在影响。与此同时,使用四种不同类型的分类器,即K近邻(KNN)、随机森林(RF)、支持向量机(SVM)和深度信念网络(DBN),分析所选特征对IDS模型分类准确率的影响。最后,对实验结果进行两步统计显著性检验,以确定哪种特征评估方法在IDS性能上产生统计显著差异。所有实验比较均在两个基准入侵检测数据集NSL-KDD和UNSW-NB15上进行。在这些实验中,一致性度量在提高检测率(DR)、误报率(FAR)、kappa统计量(KS)方面对IDS模型的检测能力有最佳影响,并能识别用于入侵检测的最重要特征。此外,从分析结果可知,RF是与这四种特征评估方法中的任何一种结合使用以实现比其他方法更好检测准确率的理想分类器。从统计结果来看,就DR和FAR而言,我们建议使用一致性度量来设计高效的IDS。