Wang Ruonan, Huang Minhuan, Zhao Jinjing, Zhang Hongzheng, Zhong Wenjing, Zhang Zhaowei, He Liqiang
Institute of Systems Engineering, Academy of Military Sciences, PLA, Beijing, 100101, China.
Sci Rep. 2024 Oct 21;14(1):24710. doi: 10.1038/s41598-024-73342-7.
Classifying malicious traffic, which can trace the lineage of attackers' malicious families, is fundamental to safeguarding cybersecurity. However, the deep learning approaches currently employed require substantial volumes of data, conflicting with the challenges in acquiring and accurately labeling malicious traffic data. Additionally, edge network devices vulnerable to cyber-attacks often cannot meet the computational demands required to deploy deep learning models. The rapid mutation of malicious activities further underscores the need for models with strong generalization capabilities to adapt to evolving threats. This paper introduces an innovative few-shot malicious traffic classification method that is precise, lightweight, and exhibits enhanced generalization. By refining traditional transfer learning, the source model is segmented into public and private feature extractors for stepwise transfer, enhancing parameter alignment with specific target tasks. Neuron importance is then sorted based on the task of each feature extractor, enabling precise pruning to create an optimal lightweight model. An adversarial network guiding principle is adopted for retraining the public feature extractor parameters, thus strengthening the model's generalization power. This method achieves an accuracy of over 97% on few-shot datasets with no more than 15 samples per class, has fewer than 50 K model parameters, and exhibits superior generalization compared to baseline methods.
对恶意流量进行分类,从而追踪攻击者恶意家族的脉络,是保障网络安全的基础。然而,目前所采用的深度学习方法需要大量数据,这与获取和准确标记恶意流量数据所面临的挑战相冲突。此外,易受网络攻击的边缘网络设备往往无法满足部署深度学习模型所需的计算要求。恶意活动的快速变异进一步凸显了对具有强大泛化能力的模型的需求,以适应不断演变的威胁。本文介绍了一种创新的少样本恶意流量分类方法,该方法精确、轻量级且具有更强的泛化能力。通过改进传统的迁移学习,将源模型分割为公共和私有特征提取器进行逐步迁移,增强与特定目标任务的参数对齐。然后根据每个特征提取器的任务对神经元重要性进行排序,可以进行精确剪枝以创建最优的轻量级模型。采用对抗网络指导原则对公共特征提取器参数进行重新训练,从而增强模型的泛化能力。该方法在每个类别不超过15个样本的少样本数据集上实现了超过97%的准确率,模型参数少于5万个,并且与基线方法相比具有卓越的泛化能力。