使用元启发式优化机器学习分类器进行内幕威胁识别的情感分类

Sentiment classification for insider threat identification using metaheuristic optimized machine learning classifiers.

作者信息

Mladenovic Djordje, Antonijevic Milos, Jovanovic Luka, Simic Vladimir, Zivkovic Miodrag, Bacanin Nebojsa, Zivkovic Tamara, Perisic Jasmina

机构信息

ICT College of vocational studies, Belgrade, Belgrade, 11000, Serbia.

Faculty of Informatics and Computing, Singidunum University, Belgrade, 11000, Serbia.

出版信息

Sci Rep. 2024 Oct 28;14(1):25731. doi: 10.1038/s41598-024-77240-w.

DOI:10.1038/s41598-024-77240-w

PMID:39468285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11519568/

Abstract

This study examines the formidable and complex challenge of insider threats to organizational security, addressing risks such as ransomware incidents, data breaches, and extortion attempts. The research involves six experiments utilizing email, HTTP, and file content data. To combat insider threats, emerging Natural Language Processing techniques are employed in conjunction with powerful Machine Learning classifiers, specifically XGBoost and AdaBoost. The focus is on recognizing the sentiment and context of malicious actions, which are considered less prone to change compared to commonly tracked metrics like location and time of access. To enhance detection, a term frequency-inverse document frequency-based approach is introduced, providing a more robust, adaptable, and maintainable method. Moreover, the study acknowledges the significant impact of hyperparameter selection on classifier performance and employs various contemporary optimizers, including a modified version of the red fox optimization algorithm. The proposed approach undergoes testing in three simulated scenarios using a public dataset, showcasing commendable outcomes.

摘要

本研究探讨了内部人员对组织安全构成的巨大且复杂的威胁这一挑战，涉及勒索软件事件、数据泄露和敲诈企图等风险。该研究包括六个利用电子邮件、HTTP和文件内容数据的实验。为应对内部人员威胁，新兴的自然语言处理技术与强大的机器学习分类器（特别是XGBoost和AdaBoost）结合使用。重点在于识别恶意行为的情感和上下文，与诸如访问位置和时间等常用跟踪指标相比，这些因素被认为不太容易发生变化。为了加强检测，引入了一种基于词频 - 逆文档频率的方法，提供了一种更强大、适应性更强且更易于维护的方法。此外，该研究认识到超参数选择对分类器性能的重大影响，并采用了各种当代优化器，包括红狐优化算法的改进版本。所提出的方法在使用公共数据集的三个模拟场景中进行了测试，展示了值得称赞的结果。