• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习模型,通过特征选择和缩放技术增强恶意软件检测。

Enhancing malware detection with feature selection and scaling techniques using machine learning models.

作者信息

Hasan Rakibul, Biswas Barna, Samiun Md, Saleh Mohammad Abu, Prabha Mani, Akter Jahanara, Joya Fatema Haque, Abdullah Masuk

机构信息

Department of Business Administration, Westcliff University, 17877 Von Karman Ave 4th Floor, Irvine, CA, 92614, USA.

Department of Business Administration, International American University, 3440 Wilshire Blvd STE 1000, Los Angeles, CA, 90010, USA.

出版信息

Sci Rep. 2025 Mar 17;15(1):9122. doi: 10.1038/s41598-025-93447-x.

DOI:10.1038/s41598-025-93447-x
PMID:40097688
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11914577/
Abstract

The increasing prevalence of malware presents a critical challenge to cybersecurity, emphasizing the need for robust detection methods. This study uses a binary tabular classification dataset to evaluate the impact of feature selection, feature scaling, and machine learning (ML) models on malware detection. The methodology involves experimenting with three feature scaling techniques (no scaling, normalization, and min-max scaling), three feature selection methods (no selection, Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA)), and twelve ML models, including traditional algorithms and ensemble methods. A publicly available dataset with 11,598 samples and 139 features is utilized, and model performance is assessed using metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Results reveal that the Light Gradient Boosting Machine (LGBM) achieves the highest accuracy of 97.16% when PCA and either min-max scaling or normalization are applied. Additionally, ensemble models consistently outperform traditional ML models, demonstrating their effectiveness in enhancing malware detection. These findings offer valuable insights into optimizing preprocessing and model selection strategies for developing reliable and efficient malware detection systems.

摘要

恶意软件的日益流行对网络安全构成了严峻挑战,凸显了强大检测方法的必要性。本研究使用二元表格分类数据集来评估特征选择、特征缩放和机器学习(ML)模型对恶意软件检测的影响。该方法包括试验三种特征缩放技术(无缩放、归一化和最小-最大缩放)、三种特征选择方法(无选择、线性判别分析(LDA)和主成分分析(PCA))以及十二个ML模型,包括传统算法和集成方法。使用了一个包含11598个样本和139个特征 的公开可用数据集,并使用诸如准确率、精确率、召回率、F1分数和AUC-ROC等指标评估模型性能。结果表明,当应用PCA以及最小-最大缩放或归一化时,轻量级梯度提升机(LGBM)实现了97.16%的最高准确率。此外,集成模型始终优于传统ML模型,证明了它们在增强恶意软件检测方面的有效性。这些发现为优化预处理和模型选择策略以开发可靠且高效的恶意软件检测系统提供了有价值的见解。

相似文献

1
Enhancing malware detection with feature selection and scaling techniques using machine learning models.使用机器学习模型,通过特征选择和缩放技术增强恶意软件检测。
Sci Rep. 2025 Mar 17;15(1):9122. doi: 10.1038/s41598-025-93447-x.
2
PermDroid a framework developed using proposed feature selection approach and machine learning techniques for Android malware detection.PermDroid是一个使用所提出的特征选择方法和机器学习技术开发的用于安卓恶意软件检测的框架。
Sci Rep. 2024 May 10;14(1):10724. doi: 10.1038/s41598-024-60982-y.
3
Feature Subset Selection for Malware Detection in Smart IoT Platforms.特征子集选择在智能物联网平台的恶意软件检测。
Sensors (Basel). 2021 Feb 16;21(4):1374. doi: 10.3390/s21041374.
4
Machine learning models and dimensionality reduction for improving the Android malware detection.用于改进安卓恶意软件检测的机器学习模型与降维
PeerJ Comput Sci. 2024 Dec 23;10:e2616. doi: 10.7717/peerj-cs.2616. eCollection 2024.
5
Windows malware detection based on static analysis with multiple features.基于多特征静态分析的Windows恶意软件检测
PeerJ Comput Sci. 2023 Apr 21;9:e1319. doi: 10.7717/peerj-cs.1319. eCollection 2023.
6
Analyzing and comparing the effectiveness of malware detection: A study of machine learning approaches.分析与比较恶意软件检测的有效性:机器学习方法研究
Heliyon. 2023 Dec 12;10(1):e23574. doi: 10.1016/j.heliyon.2023.e23574. eCollection 2024 Jan 15.
7
Artificial Intelligence Algorithms for Malware Detection in Android-Operated Mobile Devices.人工智能算法在安卓操作系统移动设备中的恶意软件检测。
Sensors (Basel). 2022 Mar 15;22(6):2268. doi: 10.3390/s22062268.
8
Analysis of Hybrid Feature Optimization Techniques Based on the Classification Accuracy of Brain Tumor Regions Using Machine Learning and Further Evaluation Based on the Institute Test Data.基于机器学习的脑肿瘤区域分类准确率的混合特征优化技术分析及基于机构测试数据的进一步评估
J Med Phys. 2024 Jan-Mar;49(1):22-32. doi: 10.4103/jmp.jmp_77_23. Epub 2024 Mar 30.
9
An Efficient DenseNet-Based Deep Learning Model for Malware Detection.一种基于高效密集连接网络的恶意软件检测深度学习模型。
Entropy (Basel). 2021 Mar 15;23(3):344. doi: 10.3390/e23030344.
10
Convolution neural network with batch normalization and inception-residual modules for Android malware classification.基于批量归一化和 Inception-Residual 模块的卷积神经网络用于安卓恶意软件分类。
Sci Rep. 2022 Aug 17;12(1):13996. doi: 10.1038/s41598-022-18402-6.

引用本文的文献

1
Adaptive malware identification via integrated SimCLR and GRU networks.通过集成SimCLR和GRU网络实现自适应恶意软件识别
Sci Rep. 2025 Jul 13;15(1):25309. doi: 10.1038/s41598-025-08556-4.
2
Crowd Evacuation in Stadiums Using Fire Alarm Prediction.利用火灾警报预测进行体育场人群疏散
Sensors (Basel). 2025 Apr 29;25(9):2810. doi: 10.3390/s25092810.

本文引用的文献

1
A hybrid cardiovascular arrhythmia disease detection using ConvNeXt-X models on electrocardiogram signals.一种基于心电图信号,使用ConvNeXt-X模型的混合心血管心律失常疾病检测方法。
Sci Rep. 2024 Dec 5;14(1):30366. doi: 10.1038/s41598-024-81992-w.
2
usfAD based effective unknown attack detection focused IDS framework.基于usfAD的有效未知攻击检测聚焦入侵检测系统框架。
Sci Rep. 2024 Nov 24;14(1):29103. doi: 10.1038/s41598-024-80021-0.
3
Screening depression among university students utilizing GHQ-12 and machine learning.利用一般健康问卷-12(GHQ-12)和机器学习对大学生进行抑郁症筛查。
Heliyon. 2024 Sep 2;10(17):e37182. doi: 10.1016/j.heliyon.2024.e37182. eCollection 2024 Sep 15.
4
A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection.一种用于僵尸网络检测的新颖的混合特征选择与基于集成的机器学习方法。
Sci Rep. 2023 Dec 1;13(1):21207. doi: 10.1038/s41598-023-48230-1.
5
Handwritten digit recognition by spin waves in a Skyrmion reservoir.利用斯格明子库中的自旋波进行手写数字识别。
Sci Rep. 2023 Nov 8;13(1):19423. doi: 10.1038/s41598-023-46677-w.
6
Site-Invariant Meta-Modulation Learning for Multisite Autism Spectrum Disorders Diagnosis.用于多站点自闭症谱系障碍诊断的位置不变元调制学习
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18062-18075. doi: 10.1109/TNNLS.2023.3311195. Epub 2024 Dec 2.
7
Unpaired Artistic Portrait Style Transfer via Asymmetric Double-Stream GAN.通过非对称双流生成对抗网络实现非配对艺术肖像风格迁移
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5427-5439. doi: 10.1109/TNNLS.2023.3263846. Epub 2023 Sep 1.
8
AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems.AndroMalPack:通过检测和移除针对 Android 系统的重打包应用,增强基于机器学习的恶意软件分类。
Sci Rep. 2022 Nov 14;12(1):19534. doi: 10.1038/s41598-022-23766-w.
9
Random Forest.随机森林
J Insur Med. 2017;47(1):31-39. doi: 10.17849/insm-47-01-31-39.1.
10
A robust data scaling algorithm to improve classification accuracies in biomedical data.一种用于提高生物医学数据分类准确率的强大数据缩放算法。
BMC Bioinformatics. 2016 Sep 9;17(1):359. doi: 10.1186/s12859-016-1236-x.