Ahmed Usman, Jiangbin Zheng, Almogren Ahmad, Sadiq Muhammad, Rehman Ateeq Ur, Sadiq M T, Choi Jaeyoung
School of Software, Northwestern Ploytechnical University, Xian, 710072, China.
Chair of Cyber Security, Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, 11633, Saudi Arabia.
Sci Rep. 2024 Dec 17;14(1):30532. doi: 10.1038/s41598-024-81151-1.
The novelty and growing sophistication of cyber threats mean that high accuracy and interpretable machine learning models are needed more than ever before for Intrusion Detection and Prevention Systems. This study aims to solve this challenge by applying Explainable AI techniques, including Shapley Additive explanations feature selection, to improve model performance, robustness, and transparency. The method systematically employs different classifiers and proposes a new hybrid method called Hybrid Bagging-Boosting and Boosting on Residuals. Then, performance is taken in four steps: the multistep evaluation of hybrid ensemble learning methods for binary classification and fine-tuning of performance; feature selection using Shapley Additive explanations values retraining the hybrid model for better performance and reducing overfitting; the generalization of the proposed model for multiclass classification; and the evaluation using standard information metrics such as accuracy, precision, recall, and F1-score. Key results indicate that the proposed methods outperform state-of-the-art algorithms, achieving a peak accuracy of 98.47% and an F1 score of 96.19%. These improvements stem from advanced feature selection and resampling techniques, enhancing model accuracy and balancing precision and recall. Integrating Shapley Additive explanations-based feature selection with hybrid ensemble methods significantly boosts the predictive and explanatory power of Intrusion Detection and Prevention Systems, addressing common pitfalls in traditional cybersecurity models. This study paves the way for further research on statistical innovations to enhance Intrusion Detection and Prevention Systems performance.
网络威胁的新颖性和日益复杂的程度意味着,入侵检测与预防系统比以往任何时候都更需要高精度且可解释的机器学习模型。本研究旨在通过应用可解释人工智能技术(包括夏普利值加法解释特征选择)来解决这一挑战,以提高模型性能、鲁棒性和透明度。该方法系统地采用了不同的分类器,并提出了一种名为混合装袋 - 提升和残差提升的新混合方法。然后,通过四个步骤评估性能:对二元分类的混合集成学习方法进行多步评估并对性能进行微调;使用夏普利值加法解释值进行特征选择,重新训练混合模型以获得更好的性能并减少过拟合;将所提出的模型推广到多类分类;以及使用标准信息指标(如准确率、精确率、召回率和F1分数)进行评估。关键结果表明,所提出的方法优于现有算法,达到了98.47%的峰值准确率和96.19%的F1分数。这些改进源于先进的特征选择和重采样技术,提高了模型准确率,并平衡了精确率和召回率。将基于夏普利值加法解释的特征选择与混合集成方法相结合,显著提高了入侵检测与预防系统的预测和解释能力,解决了传统网络安全模型中的常见缺陷。本研究为进一步开展统计创新研究以提高入侵检测与预防系统性能铺平了道路。