School of Cyberspace Security, Hangzhou Dianzi University, Hangzhou 310018, China.
School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 545001, China.
Sensors (Basel). 2022 Apr 8;22(8):2874. doi: 10.3390/s22082874.
Currently, hidden Markov-based multi-step attack detection models are mainly trained using the unsupervised Baum-Welch algorithm. The Baum-Welch algorithm is sensitive to the initial values of model parameters. However, its training uses random or average parameter initialization methods, which frequently results in the model training into a local optimum, thus, making the model unable to fit the alert logs well and thereby reducing the detection effectiveness of the model. To solve this issue, we propose a pre-training method for multi-step attack detection models based on the high semantic similarity of alerts in the same attack phase. The method first clusters the alerts based on their semantic information and pre-classifies the attack phase to which each alert belongs. Then, the distance of the alert vector to each attack stage is converted into the probability of generating alerts in each attack stage, replacing the initial value of Baum-Welch. The effectiveness of the proposed method is evaluated using the DARPA 2000 dataset, DEFCON21 CTF dataset, and ISCXIDS 2012 dataset. The experimental results show that the hidden Markov multi-step attack detection method based on pre-training of the proposed model parameters had higher detection accuracy than the Baum-Welch-based, K-means-based, and transfer learning differential evolution-based hidden Markov multi-step attack detection methods.
目前,基于隐马尔可夫模型的多步攻击检测模型主要使用无监督 Baum-Welch 算法进行训练。Baum-Welch 算法对模型参数的初始值很敏感。但是,其训练使用随机或平均参数初始化方法,这经常导致模型训练陷入局部最优,从而使模型无法很好地适应警报日志,降低模型的检测效果。为了解决这个问题,我们提出了一种基于同一攻击阶段警报的高语义相似性的多步攻击检测模型的预训练方法。该方法首先根据警报的语义信息对警报进行聚类,并对每个警报所属的攻击阶段进行预分类。然后,将警报向量到每个攻击阶段的距离转换为在每个攻击阶段生成警报的概率,从而取代 Baum-Welch 的初始值。使用 DARPA 2000 数据集、DEFCON21 CTF 数据集和 ISCXIDS 2012 数据集评估了所提出方法的有效性。实验结果表明,基于所提出模型参数的预训练的隐马尔可夫多步攻击检测方法比基于 Baum-Welch、K-means 和迁移学习差分进化的隐马尔可夫多步攻击检测方法具有更高的检测精度。