通过自适应特征选择进行不平衡多类恶意软件分类的机器学习技术

Machine learning techniques for imbalanced multiclass malware classification through adaptive feature selection.

作者信息

Panda Binayak, Bisoyi Sudhanshu Shekhar, Panigrahy Sidhanta, Mohanty Prithviraj

机构信息

Department of Computer Science and Engineering, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India.

Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India.

出版信息

PeerJ Comput Sci. 2025 Mar 25;11:e2752. doi: 10.7717/peerj-cs.2752. eCollection 2025.

DOI:10.7717/peerj-cs.2752

PMID:40567739

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12190268/

Abstract

Detecting polymorphic or metamorphic variants of known malware is an ever-growing challenge, just like detecting new malware. Artificial intelligence techniques are preferred over conventional signature-based malware detection as the number of malware variants proliferates. This article proposes an Adaptive Multiclass Malware Classification (AMMC) framework that trains base machine learning models with fewer computational resources to detect malware. Furthermore, this work proposes a novel adaptive feature selection (AFS) technique using the greedy strategy on term frequency and inverse document frequency (TF-IDF) feature weights to address the selection of influential features and ensure better performance metrics in imbalanced multiclass malware classification problems. To assess AMMC's efficacy using AFS, three open imbalanced multiclass malware datasets (VirusShare with eight classes, VirusSample with six classes, and MAL-API-2019 with eight classes) on Windows API sequence features were used. Experimental results demonstrate the effectiveness of AMMC with AFS, achieving state-of-the-art performance on VirusShare, VirusSample, and MAL-API-2019 with a macro F1-score of 0.92, 0.94, and 0.84 and macro area under the curve (AUC) of 0.99, 0.99, and 0.98, respectively. The performance measurements obtained with AMMC for all datasets were highly promising.

摘要

检测已知恶意软件的多态或变形变体与检测新的恶意软件一样，是一个日益严峻的挑战。随着恶意软件变体数量的激增，人工智能技术比传统的基于签名的恶意软件检测方法更受青睐。本文提出了一种自适应多类恶意软件分类（AMMC）框架，该框架使用较少的计算资源训练基础机器学习模型来检测恶意软件。此外，这项工作提出了一种新颖的自适应特征选择（AFS）技术，该技术对词频-逆文档频率（TF-IDF）特征权重采用贪婪策略，以解决有影响特征的选择问题，并确保在不平衡多类恶意软件分类问题中获得更好的性能指标。为了使用AFS评估AMMC的有效性，我们使用了三个基于Windows API序列特征的开放不平衡多类恶意软件数据集（八类的VirusShare、六类的VirusSample和八类的MAL-API-2019）。实验结果证明了带有AFS的AMMC的有效性，在VirusShare、VirusSample和MAL-API-2019上分别以0.92、0.94和0.84的宏F1分数以及0.99、0.99和0.98的宏曲线下面积（AUC）达到了当前最优性能。使用AMMC对所有数据集获得的性能测量结果非常可观。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aab4/12190268/50cc9464a6ba/peerj-cs-11-2752-g001.jpg

相似文献

Machine learning techniques for imbalanced multiclass malware classification through adaptive feature selection.通过自适应特征选择进行不平衡多类恶意软件分类的机器学习技术

PeerJ Comput Sci. 2025 Mar 25;11:e2752. doi: 10.7717/peerj-cs.2752. eCollection 2025.

BlockDroid: detection of Android malware from images using lightweight convolutional neural network models with ensemble learning and blockchain for mobile devices.BlockDroid：使用带有集成学习和区块链的轻量级卷积神经网络模型从图像中检测安卓恶意软件，用于移动设备。

PeerJ Comput Sci. 2025 May 30;11:e2918. doi: 10.7717/peerj-cs.2918. eCollection 2025.

Fully Automated Online Adaptive Radiation Therapy Decision-Making for Cervical Cancer Using Artificial Intelligence.使用人工智能的宫颈癌全自动在线自适应放射治疗决策

Int J Radiat Oncol Biol Phys. 2025 Jul 15;122(4):1012-1021. doi: 10.1016/j.ijrobp.2025.04.012. Epub 2025 Apr 17.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果：一种针对特定个体见解的新型验证方法。

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Atraumatic restorative treatment versus conventional restorative treatment for managing dental caries.非创伤性修复治疗与传统修复治疗在龋病管理中的比较

Cochrane Database Syst Rev. 2017 Dec 28;12(12):CD008072. doi: 10.1002/14651858.CD008072.pub2.

An ensemble approach for imbalanced multiclass malware classification using 1D-CNN.一种使用一维卷积神经网络（1D-CNN）的不平衡多类恶意软件分类集成方法。

PeerJ Comput Sci. 2023 Nov 14;9:e1677. doi: 10.7717/peerj-cs.1677. eCollection 2023.

A malware detection method with function parameters encoding and function dependency modeling.一种具有函数参数编码和函数依赖建模的恶意软件检测方法。

PeerJ Comput Sci. 2025 Jun 13;11:e2946. doi: 10.7717/peerj-cs.2946. eCollection 2025.

本文引用的文献

Channel Features and API Frequency-Based Transformer Model for Malware Identification.基于通道特征和API频率的恶意软件识别变压器模型

Sensors (Basel). 2024 Jan 17;24(2):580. doi: 10.3390/s24020580.

An ensemble approach for imbalanced multiclass malware classification using 1D-CNN.一种使用一维卷积神经网络（1D-CNN）的不平衡多类恶意软件分类集成方法。

PeerJ Comput Sci. 2023 Nov 14;9:e1677. doi: 10.7717/peerj-cs.1677. eCollection 2023.

Deep learning based Sequential model for malware analysis using Windows exe API Calls.基于深度学习的使用Windows可执行文件应用程序编程接口调用进行恶意软件分析的序列模型。

PeerJ Comput Sci. 2020 Jul 27;6:e285. doi: 10.7717/peerj-cs.285. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过自适应特征选择进行不平衡多类恶意软件分类的机器学习技术

Machine learning techniques for imbalanced multiclass malware classification through adaptive feature selection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献