基于数据流分类和概念漂移的增量学习遗传编程组合器的增强入侵检测。

Enhanced Intrusion Detection with Data Stream Classification and Concept Drift Guided by the Incremental Learning Genetic Programming Combiner.

机构信息

School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia.

National Advanced IPv6 Centre (NAv6), Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia.

出版信息

Sensors (Basel). 2023 Apr 4;23(7):3736. doi: 10.3390/s23073736.

DOI:10.3390/s23073736

PMID:37050795

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10098915/

Abstract

Concept drift (CD) in data streaming scenarios such as networking intrusion detection systems (IDS) refers to the change in the statistical distribution of the data over time. There are five principal variants related to CD: incremental, gradual, recurrent, sudden, and blip. Genetic programming combiner (GPC) classification is an effective core candidate for data stream classification for IDS. However, its basic structure relies on the usage of traditional static machine learning models that receive onetime training, limiting its ability to handle CD. To address this issue, we propose an extended variant of the GPC using three main components. First, we replace existing classifiers with alternatives: online sequential extreme learning machine (OSELM), feature adaptive OSELM (FA-OSELM), and knowledge preservation OSELM (KP-OSELM). Second, we add two new components to the GPC, specifically, a data balancing and a classifier update. Third, the coordination between the sub-models produces three novel variants of the GPC: GPC-KOS for KA-OSELM; GPC-FOS for FA-OSELM; and GPC-OS for OSELM. This article presents the first data stream-based classification framework that provides novel strategies for handling CD variants. The experimental results demonstrate that both GPC-KOS and GPC-FOS outperform the traditional GPC and other state-of-the-art methods, and the transfer learning and memory features contribute to the effective handling of most types of CD. Moreover, the application of our incremental variants on real-world datasets (KDD Cup '99, CICIDS-2017, CSE-CIC-IDS-2018, and ISCX '12) demonstrate improved performance (GPC-FOS in connection with CSE-CIC-IDS-2018 and CICIDS-2017; GPC-KOS in connection with ISCX2012 and KDD Cup '99), with maximum accuracy rates of 100% and 98% by GPC-KOS and GPC-FOS, respectively. Additionally, our GPC variants do not show superior performance in handling blip drift.

摘要

概念漂移（CD）在网络入侵检测系统（IDS）等数据流场景中是指数据的统计分布随时间的变化。与 CD 相关的主要变体有五种：增量、渐进、递归、突发和脉冲。遗传编程组合器（GPC）分类是一种有效的数据分类核心候选方法，用于 IDS。然而，它的基本结构依赖于传统静态机器学习模型的使用，这些模型只能接受一次性的训练，限制了它处理 CD 的能力。为了解决这个问题，我们提出了一种使用三个主要组件的 GPC 扩展变体。首先，我们用替代方案替换现有的分类器：在线顺序极端学习机（OSELM）、特征自适应 OSELM（FA-OSELM）和知识保留 OSELM（KP-OSELM）。其次，我们在 GPC 中添加了两个新组件，即数据平衡和分类器更新。最后，子模型之间的协调产生了 GPC 的三个新变体：GPC-KOS 用于 KA-OSELM；GPC-FOS 用于 FA-OSELM；GPC-OS 用于 OSELM。本文提出了第一个基于数据流的分类框架，为处理 CD 变体提供了新的策略。实验结果表明，GPC-KOS 和 GPC-FOS 均优于传统 GPC 和其他最先进的方法，迁移学习和记忆特征有助于有效处理大多数类型的 CD。此外，我们的增量变体在真实数据集（KDD Cup '99、CICIDS-2017、CSE-CIC-IDS-2018 和 ISCX '12）上的应用证明了性能的提高（GPC-FOS 与 CSE-CIC-IDS-2018 和 CICIDS-2017 相关联；GPC-KOS 与 ISCX2012 和 KDD Cup '99 相关联），GPC-KOS 和 GPC-FOS 的最大准确率分别为 100%和 98%。此外，我们的 GPC 变体在处理脉冲漂移方面没有表现出优越的性能。