• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于新型机器学习混合装袋技术的电子邮件垃圾邮件检测分析。

Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique.

机构信息

Department of Computer Science, Jouf University, Sakaka, Saudi Arabia.

出版信息

Comput Intell Neurosci. 2022 Aug 9;2022:2500772. doi: 10.1155/2022/2500772. eCollection 2022.

DOI:10.1155/2022/2500772
PMID:35983156
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9381222/
Abstract

e-mail service providers and consumers find it challenging to distinguish between spam and nonspam e-mails. The purpose of spammers is to spread false information by sending annoying messages that catch the attention of the public. Various spam identification techniques have been suggested and evaluated in the past, but the results show that the more research in this regard is required to enhance accuracy and to reduce training time and error rate. Thus, this research proposes a novel machine learning-based hybrid bagging method for e-mail spam identification by combining two machine learning methods: random forest and J48 (decision tree). The proposed framework categorizes the e-mail into ham and spam. The database is split into multiple sets and provided as input to each method in this procedure. Moreover, tokenization, stemming, and stop word removal are performed in the preprocessing stage. Further, correlation feature selection (CFS) is employed in this research to select the required features from the preprocessed data. The effectiveness of the presented method is evaluated in terms of true-negative rates, accuracy, recall, precision, false-positive rate, -measure, and false-negative rate; the outcomes of three studies are compared. According to the results, the presented hybrid bagged model-based SMD technology achieved 98 percent accuracy.

摘要

电子邮件服务提供商和用户发现很难区分垃圾邮件和非垃圾邮件。垃圾邮件发送者的目的是通过发送引人注目的烦人信息来传播虚假信息。过去已经提出并评估了各种垃圾邮件识别技术,但结果表明,需要更多的研究来提高准确性,减少训练时间和错误率。因此,本研究提出了一种基于机器学习的混合装袋方法,用于通过结合两种机器学习方法:随机森林和 J48(决策树)来识别电子邮件垃圾邮件。所提出的框架将电子邮件分类为正常邮件和垃圾邮件。数据库被分成多组,并在该过程中作为输入提供给每种方法。此外,在预处理阶段执行标记化、词干化和停用词删除。此外,本研究还采用相关特征选择(CFS)从预处理数据中选择所需的特征。根据三个研究的结果进行比较,评估所提出方法的有效性,包括真阴性率、准确性、召回率、精度、假阳性率、F1 分数和假阴性率。根据结果,所提出的基于混合装袋模型的 SMD 技术实现了 98%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/b99a42e47d5b/CIN2022-2500772.alg.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/06ef519dcf59/CIN2022-2500772.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/dbde6a110ee3/CIN2022-2500772.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/6701cb71029b/CIN2022-2500772.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/8c9ad9cc64e6/CIN2022-2500772.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/c66e19f6aa80/CIN2022-2500772.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/2d0dd4a8cd92/CIN2022-2500772.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/6a4f0de5822f/CIN2022-2500772.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/a04f149006f4/CIN2022-2500772.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/fdb8e568ad2b/CIN2022-2500772.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/38103ad91cc5/CIN2022-2500772.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/b99a42e47d5b/CIN2022-2500772.alg.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/06ef519dcf59/CIN2022-2500772.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/dbde6a110ee3/CIN2022-2500772.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/6701cb71029b/CIN2022-2500772.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/8c9ad9cc64e6/CIN2022-2500772.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/c66e19f6aa80/CIN2022-2500772.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/2d0dd4a8cd92/CIN2022-2500772.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/6a4f0de5822f/CIN2022-2500772.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/a04f149006f4/CIN2022-2500772.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/fdb8e568ad2b/CIN2022-2500772.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/38103ad91cc5/CIN2022-2500772.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c81/9381222/b99a42e47d5b/CIN2022-2500772.alg.002.jpg

相似文献

1
Analysis of e-Mail Spam Detection Using a Novel Machine Learning-Based Hybrid Bagging Technique.基于新型机器学习混合装袋技术的电子邮件垃圾邮件检测分析。
Comput Intell Neurosci. 2022 Aug 9;2022:2500772. doi: 10.1155/2022/2500772. eCollection 2022.
2
Efficient E-Mail Spam Detection Strategy Using Genetic Decision Tree Processing with NLP Features.基于自然语言处理特征的遗传决策树处理的高效电子邮件垃圾邮件检测策略。
Comput Intell Neurosci. 2022 Mar 24;2022:7710005. doi: 10.1155/2022/7710005. eCollection 2022.
3
Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.用于垃圾邮件过滤中分类器组合、特征提取和性能评估的有效信息论策略,以改善误报和漏报情况。
Neural Netw. 2005 Jun-Jul;18(5-6):799-807. doi: 10.1016/j.neunet.2005.06.045.
4
Evading obscure communication from spam emails.避免垃圾邮件中隐晦的通讯。
Math Biosci Eng. 2022 Jan;19(2):1926-1943. doi: 10.3934/mbe.2022091. Epub 2021 Dec 22.
5
SMS sentiment classification using an evolutionary optimization based fuzzy recurrent neural network.基于进化优化模糊递归神经网络的短信情感分类
Multimed Tools Appl. 2023 Apr 11:1-32. doi: 10.1007/s11042-023-15206-2.
6
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text.深度卷积森林:一种用于文本中垃圾邮件检测的动态深度集成方法。
Complex Intell Systems. 2022;8(6):4897-4909. doi: 10.1007/s40747-022-00741-6. Epub 2022 Apr 26.
7
[Spams in doctors' mailbox: their threat to health education, to patient information and to scientific research].[医生邮箱中的垃圾邮件:它们对健康教育、患者信息和科学研究的威胁]
Orv Hetil. 2019 Oct;160(43):1706-1710. doi: 10.1556/650.2019.31531.
8
Application of interval type-2 fuzzy logic and type-1 fuzzy logic-based approaches to social networks for spam detection with combined feature capabilities.基于区间二型模糊逻辑和一型模糊逻辑的方法在具有组合特征能力的社交网络垃圾邮件检测中的应用。
PeerJ Comput Sci. 2023 Apr 21;9:e1316. doi: 10.7717/peerj-cs.1316. eCollection 2023.
9
An efficient incremental learning mechanism for tracking concept drift in spam filtering.一种用于垃圾邮件过滤中跟踪概念漂移的高效增量学习机制。
PLoS One. 2017 Feb 9;12(2):e0171518. doi: 10.1371/journal.pone.0171518. eCollection 2017.
10
A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts.一种用于快速过滤垃圾短信的具有新词加权的混合模型。
Sensors (Basel). 2023 Nov 4;23(21):8975. doi: 10.3390/s23218975.

引用本文的文献

1
Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems.不均衡的类别分布与性能评估指标:关于医疗系统中用于确定模型性能的预测准确性的系统综述
PLOS Digit Health. 2023 Nov 30;2(11):e0000290. doi: 10.1371/journal.pdig.0000290. eCollection 2023 Nov.

本文引用的文献

1
Efficient E-Mail Spam Detection Strategy Using Genetic Decision Tree Processing with NLP Features.基于自然语言处理特征的遗传决策树处理的高效电子邮件垃圾邮件检测策略。
Comput Intell Neurosci. 2022 Mar 24;2022:7710005. doi: 10.1155/2022/7710005. eCollection 2022.
2
Estimation and Prediction of Hospitalization and Medical Care Costs Using Regression in Machine Learning.利用机器学习中的回归技术估算和预测住院和医疗费用。
J Healthc Eng. 2022 Mar 2;2022:7969220. doi: 10.1155/2022/7969220. eCollection 2022.
3
Machine Algorithm for Heartbeat Monitoring and Arrhythmia Detection Based on ECG Systems.
基于 ECG 系统的心搏监测和心律失常检测的机器算法。
Comput Intell Neurosci. 2021 Dec 30;2021:7677568. doi: 10.1155/2021/7677568. eCollection 2021.
4
Enhancing Personalized Ads Using Interest Category Classification of SNS Users Based on Deep Neural Networks.基于深度神经网络的社交网络用户兴趣分类提升个性化广告效果。
Sensors (Basel). 2020 Dec 30;21(1):199. doi: 10.3390/s21010199.
5
Composite learning sliding mode synchronization of chaotic fractional-order neural networks.混沌分数阶神经网络的复合学习滑模同步
J Adv Res. 2020 Apr 26;25:87-96. doi: 10.1016/j.jare.2020.04.006. eCollection 2020 Sep.
6
Machine learning for email spam filtering: review, approaches and open research problems.用于电子邮件垃圾邮件过滤的机器学习:综述、方法及开放研究问题。
Heliyon. 2019 Jun 10;5(6):e01802. doi: 10.1016/j.heliyon.2019.e01802. eCollection 2019 Jun.