Suppr超能文献

通过自适应特征选择进行不平衡多类恶意软件分类的机器学习技术

Machine learning techniques for imbalanced multiclass malware classification through adaptive feature selection.

作者信息

Panda Binayak, Bisoyi Sudhanshu Shekhar, Panigrahy Sidhanta, Mohanty Prithviraj

机构信息

Department of Computer Science and Engineering, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India.

Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India.

出版信息

PeerJ Comput Sci. 2025 Mar 25;11:e2752. doi: 10.7717/peerj-cs.2752. eCollection 2025.

Abstract

Detecting polymorphic or metamorphic variants of known malware is an ever-growing challenge, just like detecting new malware. Artificial intelligence techniques are preferred over conventional signature-based malware detection as the number of malware variants proliferates. This article proposes an Adaptive Multiclass Malware Classification (AMMC) framework that trains base machine learning models with fewer computational resources to detect malware. Furthermore, this work proposes a novel adaptive feature selection (AFS) technique using the greedy strategy on term frequency and inverse document frequency (TF-IDF) feature weights to address the selection of influential features and ensure better performance metrics in imbalanced multiclass malware classification problems. To assess AMMC's efficacy using AFS, three open imbalanced multiclass malware datasets (VirusShare with eight classes, VirusSample with six classes, and MAL-API-2019 with eight classes) on Windows API sequence features were used. Experimental results demonstrate the effectiveness of AMMC with AFS, achieving state-of-the-art performance on VirusShare, VirusSample, and MAL-API-2019 with a macro F1-score of 0.92, 0.94, and 0.84 and macro area under the curve (AUC) of 0.99, 0.99, and 0.98, respectively. The performance measurements obtained with AMMC for all datasets were highly promising.

摘要

检测已知恶意软件的多态或变形变体与检测新的恶意软件一样,是一个日益严峻的挑战。随着恶意软件变体数量的激增,人工智能技术比传统的基于签名的恶意软件检测方法更受青睐。本文提出了一种自适应多类恶意软件分类(AMMC)框架,该框架使用较少的计算资源训练基础机器学习模型来检测恶意软件。此外,这项工作提出了一种新颖的自适应特征选择(AFS)技术,该技术对词频-逆文档频率(TF-IDF)特征权重采用贪婪策略,以解决有影响特征的选择问题,并确保在不平衡多类恶意软件分类问题中获得更好的性能指标。为了使用AFS评估AMMC的有效性,我们使用了三个基于Windows API序列特征的开放不平衡多类恶意软件数据集(八类的VirusShare、六类的VirusSample和八类的MAL-API-2019)。实验结果证明了带有AFS的AMMC的有效性,在VirusShare、VirusSample和MAL-API-2019上分别以0.92、0.94和0.84的宏F1分数以及0.99、0.99和0.98的宏曲线下面积(AUC)达到了当前最优性能。使用AMMC对所有数据集获得的性能测量结果非常可观。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aab4/12190268/50cc9464a6ba/peerj-cs-11-2752-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验