使用基于决策树的近端策略优化算法高效检测恶意流量：一种结合熵的深度强化学习恶意流量检测模型

Efficient Detection of Malicious Traffic Using a Decision Tree-Based Proximal Policy Optimisation Algorithm: A Deep Reinforcement Learning Malicious Traffic Detection Model Incorporating Entropy.

作者信息

Zhao Yuntao, Ma Deao, Liu Wei

机构信息

School of Information Science and Engineering, Shenyang Ligong University, Shenyang 110159, China.

出版信息

Entropy (Basel). 2024 Jul 30;26(8):648. doi: 10.3390/e26080648.

DOI:10.3390/e26080648

PMID:39202118

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11353857/

Abstract

With the popularity of the Internet and the increase in the level of information technology, cyber attacks have become an increasingly serious problem. They pose a great threat to the security of individuals, enterprises, and the state. This has made network intrusion detection technology critically important. In this paper, a malicious traffic detection model is constructed based on a decision tree classifier of entropy and a proximal policy optimisation algorithm (PPO) of deep reinforcement learning. Firstly, the decision tree idea in machine learning is used to make a preliminary classification judgement on the dataset based on the information entropy. The importance score of each feature in the classification work is calculated and the features with lower contributions are removed. Then, it is handed over to the PPO algorithm model for detection. An entropy regularity term is introduced in the process of the PPO algorithm update. Finally, the deep reinforcement learning algorithm is used to continuously train and update the parameters during the detection process, and finally, the detection model with higher accuracy is obtained. Experiments show that the binary classification accuracy of the malicious traffic detection model based on the deep reinforcement learning PPO algorithm can reach 99.17% under the CIC-IDS2017 dataset used in this paper.

摘要

随着互联网的普及和信息技术水平的提高，网络攻击已成为一个日益严重的问题。它们对个人、企业和国家的安全构成了巨大威胁。这使得网络入侵检测技术变得至关重要。本文基于熵的决策树分类器和深度强化学习的近端策略优化算法（PPO）构建了一种恶意流量检测模型。首先，利用机器学习中的决策树思想，基于信息熵对数据集进行初步分类判断。计算分类工作中每个特征的重要性得分，并去除贡献较低的特征。然后，将其交给PPO算法模型进行检测。在PPO算法更新过程中引入熵正则项。最后，在检测过程中利用深度强化学习算法不断训练和更新参数，最终得到准确率较高的检测模型。实验表明，基于深度强化学习PPO算法的恶意流量检测模型在本文使用的CIC-IDS2017数据集下，二分类准确率可达99.17%。