College of Information Engineering, Shanghai Maritime University, Shanghai, China.
College of Artificial Intelligence, Jiangxi University of Technology, Jiangxi, China.
PLoS One. 2024 Apr 18;19(4):e0298809. doi: 10.1371/journal.pone.0298809. eCollection 2024.
With the rapid development of the Internet, the continuous increase of malware and its variants have brought greatly challenges for cyber security. Due to the imbalance of the data distribution, the research on malware detection focuses on the accuracy of the whole data sample, while ignoring the detection rate of the minority categories' malware. In the dataset sample, the normal data samples account for the majority, while the attacks' malware accounts for the minority. However, the minority categories' attacks will bring great losses to countries, enterprises, or individuals. For solving the problem, this study proposed the GNGS algorithm to construct a new balance dataset for the model algorithm to pay more attention to the feature learning of the minority attacks' malware to improve the detection rate of attacks' malware. The traditional malware detection method is highly dependent on professional knowledge and static analysis, so we used the Self-Attention with Gate mechanism (SAG) based on the Transformer to carry out feature extraction between the local and global features and filter irrelevant noise information, then extracted the long-distance dependency temporal sequence features by the BiGRU network, and obtained the classification results through the SoftMax classifier. In the study, we used the Alibaba Cloud dataset for malware multi-classification. Compared the GSB deep learning network model with other current studies, the experimental results showed that the Gaussian noise generation strategy (GNGS) could solve the unbalanced distribution of minority categories' malware and the SAG-BiGRU algorithm obtained the accuracy rate of 88.7% on the eight-classification, which has better performance than other existing algorithms, and the GSB model also has a good effect on the NSL-KDD dataset, which showed the GSB model is effective for other network intrusion detection.
随着互联网的飞速发展,恶意软件及其变种的不断增加给网络安全带来了极大的挑战。由于数据分布不平衡,恶意软件检测的研究侧重于整个数据样本的准确性,而忽略了少数类别恶意软件的检测率。在数据集样本中,正常数据样本占多数,而攻击恶意软件占少数。然而,少数类别的攻击会给国家、企业或个人带来巨大的损失。为了解决这个问题,本研究提出了 GNGS 算法,为模型算法构建一个新的平衡数据集,使模型算法更加关注少数攻击恶意软件的特征学习,提高攻击恶意软件的检测率。传统的恶意软件检测方法高度依赖专业知识和静态分析,因此我们使用基于 Transformer 的 Self-Attention with Gate mechanism (SAG) 来进行局部和全局特征之间的特征提取,并过滤无关的噪声信息,然后通过 BiGRU 网络提取长距离依赖的时间序列特征,并通过 SoftMax 分类器获得分类结果。在研究中,我们使用阿里云数据集进行恶意软件多分类。将 GSB 深度学习网络模型与其他当前研究进行比较,实验结果表明,高斯噪声生成策略(GNGS)可以解决少数类别恶意软件的不平衡分布问题,SAG-BiGRU 算法在八分类中的准确率达到 88.7%,性能优于其他现有算法,GSB 模型在 NSL-KDD 数据集上也有很好的效果,表明 GSB 模型对其他网络入侵检测也有很好的效果。