Shaaban Mai A, Hassan Yasser F, Guirguis Shawkat K
Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, Alexandria, Egypt.
Faculty of Computers and Data Science, Alexandria University, Alexandria, Egypt.
Complex Intell Systems. 2022;8(6):4897-4909. doi: 10.1007/s40747-022-00741-6. Epub 2022 Apr 26.
The increase in people's use of mobile messaging services has led to the spread of social engineering attacks like phishing, considering that spam text is one of the main factors in the dissemination of phishing attacks to steal sensitive data such as credit cards and passwords. In addition, rumors and incorrect medical information regarding the COVID-19 pandemic are widely shared on social media leading to people's fear and confusion. Thus, filtering spam content is vital to reduce risks and threats. Previous studies relied on machine learning and deep learning approaches for spam classification, but these approaches have two limitations. Machine learning models require manual feature engineering, whereas deep neural networks require a high computational cost. This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. The proposed model utilizes convolutional and pooling layers for feature extraction along with base classifiers such as random forests and extremely randomized trees for classifying texts into spam or legitimate ones. Moreover, the model employs ensemble learning procedures like boosting and bagging. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
人们对移动消息服务使用的增加导致了网络钓鱼等社会工程攻击的传播,因为垃圾短信是传播网络钓鱼攻击以窃取信用卡和密码等敏感数据的主要因素之一。此外,关于新冠疫情的谣言和错误医疗信息在社交媒体上广泛传播,导致人们恐惧和困惑。因此,过滤垃圾内容对于降低风险和威胁至关重要。以往的研究依赖机器学习和深度学习方法进行垃圾邮件分类,但这些方法有两个局限性。机器学习模型需要人工进行特征工程,而深度神经网络需要高昂的计算成本。本文介绍了一种用于垃圾邮件检测的动态深度集成模型,该模型可自动调整其复杂度并提取特征。所提出的模型利用卷积层和池化层进行特征提取,并使用随机森林和极端随机树等基础分类器将文本分类为垃圾邮件或合法邮件。此外,该模型采用了提升和装袋等集成学习过程。结果,该模型实现了98.38%的高精度、召回率、F1分数和准确率。