Department of Computer Science, Jouf University, Sakaka, Saudi Arabia.
Comput Intell Neurosci. 2022 Aug 9;2022:2500772. doi: 10.1155/2022/2500772. eCollection 2022.
e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, -measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy.
电子邮件服务提供商和用户发现很难区分垃圾邮件和非垃圾邮件。垃圾邮件发送者的目的是通过发送引人注目的烦人信息来传播虚假信息。过去已经提出并评估了各种垃圾邮件识别技术,但结果表明,需要更多的研究来提高准确性,减少训练时间和错误率。因此,本研究提出了一种基于机器学习的混合装袋方法,用于通过结合两种机器学习方法:随机森林和 J48(决策树)来识别电子邮件垃圾邮件。所提出的框架将电子邮件分类为正常邮件和垃圾邮件。数据库被分成多组,并在该过程中作为输入提供给每种方法。此外,在预处理阶段执行标记化、词干化和停用词删除。此外,本研究还采用相关特征选择(CFS)从预处理数据中选择所需的特征。根据三个研究的结果进行比较,评估所提出方法的有效性,包括真阴性率、准确性、召回率、精度、假阳性率、F1 分数和假阴性率。根据结果,所提出的基于混合装袋模型的 SMD 技术实现了 98%的准确率。