Suppr超能文献

利用基于转换器的嵌入和集成学习增强垃圾邮件消息分类和检测。

Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning.

机构信息

Department of Computer Science, Jouf University, Sakaka 72388, Saudi Arabia.

Higher School of Sciences and Technology of Hammam Sousse, University of Sousse, Sousse 4011, Tunisia.

出版信息

Sensors (Basel). 2023 Apr 10;23(8):3861. doi: 10.3390/s23083861.

Abstract

Over the last decade, the Short Message Service (SMS) has become a primary communication channel. Nevertheless, its popularity has also given rise to the so-called SMS spam. These messages, i.e., spam, are annoying and potentially malicious by exposing SMS users to credential theft and data loss. To mitigate this persistent threat, we propose a new model for SMS spam detection based on pre-trained Transformers and Ensemble Learning. The proposed model uses a text embedding technique that builds on the recent advancements of the GPT-3 Transformer. This technique provides a high-quality representation that can improve detection results. In addition, we used an Ensemble Learning method where four machine learning models were grouped into one model that performed significantly better than its separate constituent parts. The experimental evaluation of the model was performed using the SMS Spam Collection Dataset. The obtained results showed a state-of-the-art performance that exceeded all previous works with an accuracy that reached 99.91%.

摘要

在过去的十年中,短消息服务(SMS)已成为主要的通信渠道。然而,它的普及也带来了所谓的短信垃圾邮件。这些短信,即垃圾短信,通过使 SMS 用户面临凭证盗窃和数据丢失的风险,变得令人讨厌且具有潜在的恶意。为了减轻这种持续存在的威胁,我们提出了一种基于预训练转换器和集成学习的 SMS 垃圾邮件检测新模型。所提出的模型使用了一种文本嵌入技术,该技术建立在 GPT-3 转换器的最新进展之上。该技术提供了高质量的表示,可以提高检测结果。此外,我们使用了一种集成学习方法,其中四个机器学习模型被组合成一个模型,该模型的性能明显优于其单独的组成部分。该模型的实验评估是使用 SMS 垃圾邮件收集数据集进行的。所获得的结果显示出了最先进的性能,其准确率达到了 99.91%,超过了所有以前的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c36/10146782/854fee94c300/sensors-23-03861-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验