Department of Computer Science, Jouf University, Sakaka 72388, Saudi Arabia.
Higher School of Sciences and Technology of Hammam Sousse, University of Sousse, Sousse 4011, Tunisia.
Sensors (Basel). 2023 Apr 10;23(8):3861. doi: 10.3390/s23083861.
Over the last decade, the Short Message Service (SMS) has become a primary communication channel. Nevertheless, its popularity has also given rise to the so-called SMS spam. These messages, i.e., spam, are annoying and potentially malicious by exposing SMS users to credential theft and data loss. To mitigate this persistent threat, we propose a new model for SMS spam detection based on pre-trained Transformers and Ensemble Learning. The proposed model uses a text embedding technique that builds on the recent advancements of the GPT-3 Transformer. This technique provides a high-quality representation that can improve detection results. In addition, we used an Ensemble Learning method where four machine learning models were grouped into one model that performed significantly better than its separate constituent parts. The experimental evaluation of the model was performed using the SMS Spam Collection Dataset. The obtained results showed a state-of-the-art performance that exceeded all previous works with an accuracy that reached 99.91%.
在过去的十年中,短消息服务(SMS)已成为主要的通信渠道。然而,它的普及也带来了所谓的短信垃圾邮件。这些短信,即垃圾短信,通过使 SMS 用户面临凭证盗窃和数据丢失的风险,变得令人讨厌且具有潜在的恶意。为了减轻这种持续存在的威胁,我们提出了一种基于预训练转换器和集成学习的 SMS 垃圾邮件检测新模型。所提出的模型使用了一种文本嵌入技术,该技术建立在 GPT-3 转换器的最新进展之上。该技术提供了高质量的表示,可以提高检测结果。此外,我们使用了一种集成学习方法,其中四个机器学习模型被组合成一个模型,该模型的性能明显优于其单独的组成部分。该模型的实验评估是使用 SMS 垃圾邮件收集数据集进行的。所获得的结果显示出了最先进的性能,其准确率达到了 99.91%,超过了所有以前的工作。