利用基于转换器的嵌入和集成学习增强垃圾邮件消息分类和检测。

Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning.

机构信息

Department of Computer Science, Jouf University, Sakaka 72388, Saudi Arabia.

Higher School of Sciences and Technology of Hammam Sousse, University of Sousse, Sousse 4011, Tunisia.

出版信息

Sensors (Basel). 2023 Apr 10;23(8):3861. doi: 10.3390/s23083861.

DOI:10.3390/s23083861

PMID:37112202

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10146782/

Abstract

Over the last decade, the Short Message Service (SMS) has become a primary communication channel. Nevertheless, its popularity has also given rise to the so-called SMS spam. These messages, i.e., spam, are annoying and potentially malicious by exposing SMS users to credential theft and data loss. To mitigate this persistent threat, we propose a new model for SMS spam detection based on pre-trained Transformers and Ensemble Learning. The proposed model uses a text embedding technique that builds on the recent advancements of the GPT-3 Transformer. This technique provides a high-quality representation that can improve detection results. In addition, we used an Ensemble Learning method where four machine learning models were grouped into one model that performed significantly better than its separate constituent parts. The experimental evaluation of the model was performed using the SMS Spam Collection Dataset. The obtained results showed a state-of-the-art performance that exceeded all previous works with an accuracy that reached 99.91%.

摘要

在过去的十年中，短消息服务（SMS）已成为主要的通信渠道。然而，它的普及也带来了所谓的短信垃圾邮件。这些短信，即垃圾短信，通过使 SMS 用户面临凭证盗窃和数据丢失的风险，变得令人讨厌且具有潜在的恶意。为了减轻这种持续存在的威胁，我们提出了一种基于预训练转换器和集成学习的 SMS 垃圾邮件检测新模型。所提出的模型使用了一种文本嵌入技术，该技术建立在 GPT-3 转换器的最新进展之上。该技术提供了高质量的表示，可以提高检测结果。此外，我们使用了一种集成学习方法，其中四个机器学习模型被组合成一个模型，该模型的性能明显优于其单独的组成部分。该模型的实验评估是使用 SMS 垃圾邮件收集数据集进行的。所获得的结果显示出了最先进的性能，其准确率达到了 99.91%，超过了所有以前的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c36/10146782/854fee94c300/sensors-23-03861-g001.jpg

相似文献

Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning.利用基于转换器的嵌入和集成学习增强垃圾邮件消息分类和检测。

Sensors (Basel). 2023 Apr 10;23(8):3861. doi: 10.3390/s23083861.

SMS sentiment classification using an evolutionary optimization based fuzzy recurrent neural network.基于进化优化模糊递归神经网络的短信情感分类

Multimed Tools Appl. 2023 Apr 11:1-32. doi: 10.1007/s11042-023-15206-2.

An intelligent identification and classification system for malicious uniform resource locators (URLs).一种针对恶意统一资源定位符（URL）的智能识别与分类系统。

Neural Comput Appl. 2023 Apr 20:1-17. doi: 10.1007/s00521-023-08592-z.

A systematic literature review on spam content detection and classification.关于垃圾邮件内容检测与分类的系统文献综述。

PeerJ Comput Sci. 2022 Jan 20;8:e830. doi: 10.7717/peerj-cs.830. eCollection 2022.

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts.一种用于快速过滤垃圾短信的具有新词加权的混合模型。

Sensors (Basel). 2023 Nov 4;23(21):8975. doi: 10.3390/s23218975.

Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text.深度卷积森林：一种用于文本中垃圾邮件检测的动态深度集成方法。

Complex Intell Systems. 2022;8(6):4897-4909. doi: 10.1007/s40747-022-00741-6. Epub 2022 Apr 26.

Evading obscure communication from spam emails.避免垃圾邮件中隐晦的通讯。

Math Biosci Eng. 2022 Jan;19(2):1926-1943. doi: 10.3934/mbe.2022091. Epub 2021 Dec 22.

Application of interval type-2 fuzzy logic and type-1 fuzzy logic-based approaches to social networks for spam detection with combined feature capabilities.基于区间二型模糊逻辑和一型模糊逻辑的方法在具有组合特征能力的社交网络垃圾邮件检测中的应用。

PeerJ Comput Sci. 2023 Apr 21;9:e1316. doi: 10.7717/peerj-cs.1316. eCollection 2023.

DSmishSMS-A System to Detect Smishing SMS.DSmishSMS-A：一种检测网络钓鱼短信的系统。

Neural Comput Appl. 2023;35(7):4975-4992. doi: 10.1007/s00521-021-06305-y. Epub 2021 Jul 28.

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning.基于集成学习和度量学习的临床试验资格标准文本的自动分类。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):129. doi: 10.1186/s12911-021-01492-z.

引用本文的文献

Improving the accuracy of cybersecurity spam email detection using ensemble techniques: A stacking approach Machine learning for spam email detection.使用集成技术提高网络安全垃圾邮件检测的准确性：一种堆叠方法用于垃圾邮件检测的机器学习。

PLoS One. 2025 Sep 3;20(9):e0331574. doi: 10.1371/journal.pone.0331574. eCollection 2025.

Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection.用于通过误用和异常检测来检测新型推特垃圾信息的可扩展学习框架。

Sensors (Basel). 2024 Apr 2;24(7):2263. doi: 10.3390/s24072263.

Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems.不均衡的类别分布与性能评估指标：关于医疗系统中用于确定模型性能的预测准确性的系统综述

PLOS Digit Health. 2023 Nov 30;2(11):e0000290. doi: 10.1371/journal.pdig.0000290. eCollection 2023 Nov.

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts.一种用于快速过滤垃圾短信的具有新词加权的混合模型。

Sensors (Basel). 2023 Nov 4;23(21):8975. doi: 10.3390/s23218975.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基于转换器的嵌入和集成学习增强垃圾邮件消息分类和检测。

Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献