• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于光学字符识别的短信诈骗检测应用:利用无监督和深度半监督学习处理图像数据

SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning.

作者信息

Shinde Anjali, Shahra Essa Q, Basurra Shadi, Saeed Faisal, AlSewari Abdulrahman A, Jabbar Waheb A

机构信息

Faculty of Computing, Engineering and Built Environment, Birmingham City University, Birmingham B4 7RQ, UK.

出版信息

Sensors (Basel). 2024 Sep 20;24(18):6084. doi: 10.3390/s24186084.

DOI:10.3390/s24186084
PMID:39338829
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11435858/
Abstract

The growing problem of unsolicited text messages (smishing) and data irregularities necessitates stronger spam detection solutions. This paper explores the development of a sophisticated model designed to identify smishing messages by understanding the complex relationships among words, images, and context-specific factors, areas that remain underexplored in existing research. To address this, we merge a UCI spam dataset of regular text messages with real-world spam data, leveraging OCR technology for comprehensive analysis. The study employs a combination of traditional machine learning models, including K-means, Non-Negative Matrix Factorization, and Gaussian Mixture Models, along with feature extraction techniques such as TF-IDF and PCA. Additionally, deep learning models like RNN-Flatten, LSTM, and Bi-LSTM are utilized. The selection of these models is driven by their complementary strengths in capturing both the linear and non-linear relationships inherent in smishing messages. Machine learning models are chosen for their efficiency in handling structured text data, while deep learning models are selected for their superior ability to capture sequential dependencies and contextual nuances. The performance of these models is rigorously evaluated using metrics like accuracy, precision, recall, and F1 score, enabling a comparative analysis between the machine learning and deep learning approaches. Notably, the K-means feature extraction with vectorizer achieved 91.01% accuracy, and the KNN-Flatten model reached 94.13% accuracy, emerging as the top performer. The rationale behind highlighting these models is their potential to significantly improve smishing detection rates. For instance, the high accuracy of the KNN-Flatten model suggests its applicability in real-time spam detection systems, but its computational complexity might limit scalability in large-scale deployments. Similarly, while K-means with vectorizer excels in accuracy, it may struggle with the dynamic and evolving nature of smishing attacks, necessitating continual retraining.

摘要

未经请求的短信(网络钓鱼短信)和数据违规问题日益严重,因此需要更强大的垃圾邮件检测解决方案。本文探讨了一种复杂模型的开发,该模型旨在通过理解单词、图像和特定上下文因素之间的复杂关系来识别网络钓鱼短信,而这些领域在现有研究中仍未得到充分探索。为了解决这个问题,我们将UCI常规短信垃圾邮件数据集与现实世界中的垃圾邮件数据合并,并利用光学字符识别(OCR)技术进行全面分析。该研究采用了多种传统机器学习模型,包括K均值、非负矩阵分解和高斯混合模型,以及诸如词频-逆文档频率(TF-IDF)和主成分分析(PCA)等特征提取技术。此外,还使用了诸如递归神经网络-展平(RNN-Flatten)、长短期记忆网络(LSTM)和双向长短期记忆网络(Bi-LSTM)等深度学习模型。选择这些模型是因为它们在捕捉网络钓鱼短信中固有的线性和非线性关系方面具有互补优势。选择机器学习模型是因为它们在处理结构化文本数据方面效率高,而选择深度学习模型是因为它们在捕捉序列依赖性和上下文细微差别方面具有卓越能力。使用诸如准确率、精确率、召回率和F1分数等指标对这些模型的性能进行了严格评估,从而能够对机器学习方法和深度学习方法进行比较分析。值得注意的是,使用向量化器的K均值特征提取的准确率达到了91.01%,而KNN-展平模型的准确率达到了94.13%,成为表现最佳的模型。突出这些模型的背后原因是它们有显著提高网络钓鱼检测率的潜力。例如,KNN-展平模型的高准确率表明它适用于实时垃圾邮件检测系统,但其计算复杂性可能会限制大规模部署中的可扩展性。同样,虽然带向量化器的K均值在准确率方面表现出色,但它可能难以应对网络钓鱼攻击的动态性和不断变化的性质,因此需要持续重新训练。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/d6e05fffed36/sensors-24-06084-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/ce86b8e2f90d/sensors-24-06084-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/1cba31e4a496/sensors-24-06084-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/407c6a1b2990/sensors-24-06084-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/4c316ed70175/sensors-24-06084-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/a06ef71cde54/sensors-24-06084-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/40d5f426d95d/sensors-24-06084-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/26cd83c8e4ae/sensors-24-06084-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/8a42588a6173/sensors-24-06084-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/f622f2dee1dc/sensors-24-06084-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/a9f75db760d5/sensors-24-06084-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/d6e05fffed36/sensors-24-06084-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/ce86b8e2f90d/sensors-24-06084-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/1cba31e4a496/sensors-24-06084-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/407c6a1b2990/sensors-24-06084-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/4c316ed70175/sensors-24-06084-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/a06ef71cde54/sensors-24-06084-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/40d5f426d95d/sensors-24-06084-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/26cd83c8e4ae/sensors-24-06084-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/8a42588a6173/sensors-24-06084-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/f622f2dee1dc/sensors-24-06084-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/a9f75db760d5/sensors-24-06084-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b0a8/11435858/d6e05fffed36/sensors-24-06084-g011.jpg

相似文献

1
SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning.基于光学字符识别的短信诈骗检测应用:利用无监督和深度半监督学习处理图像数据
Sensors (Basel). 2024 Sep 20;24(18):6084. doi: 10.3390/s24186084.
2
DSmishSMS-A System to Detect Smishing SMS.DSmishSMS-A:一种检测网络钓鱼短信的系统。
Neural Comput Appl. 2023;35(7):4975-4992. doi: 10.1007/s00521-021-06305-y. Epub 2021 Jul 28.
3
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text.深度卷积森林:一种用于文本中垃圾邮件检测的动态深度集成方法。
Complex Intell Systems. 2022;8(6):4897-4909. doi: 10.1007/s40747-022-00741-6. Epub 2022 Apr 26.
4
Implementation of 'Smishing Detector': An Efficient Model for Smishing Detection Using Neural Network.“网络钓鱼诈骗检测器”的实现:一种使用神经网络进行网络钓鱼诈骗检测的高效模型
SN Comput Sci. 2022;3(3):189. doi: 10.1007/s42979-022-01078-0. Epub 2022 Mar 15.
5
Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data.通过TF-IDF和Word2vec文本分析研究反应行为:以2012年国际学生评估项目(PISA)解决问题过程数据为例
Heliyon. 2024 Aug 10;10(16):e35945. doi: 10.1016/j.heliyon.2024.e35945. eCollection 2024 Aug 30.
6
COVID-19 diagnosis: A comprehensive review of pre-trained deep learning models based on feature extraction algorithm.COVID-19诊断:基于特征提取算法的预训练深度学习模型综合综述
Results Eng. 2023 Jun;18:101020. doi: 10.1016/j.rineng.2023.101020. Epub 2023 Mar 16.
7
Epileptic Patient Activity Recognition System Using Extreme Learning Machine Method.基于极限学习机方法的癫痫患者活动识别系统
Biomedicines. 2023 Mar 7;11(3):816. doi: 10.3390/biomedicines11030816.
8
Bayesian optimized multimodal deep hybrid learning approach for tomato leaf disease classification.贝叶斯优化多模态深度混合学习方法在番茄叶部病害分类中的应用。
Sci Rep. 2024 Sep 14;14(1):21525. doi: 10.1038/s41598-024-72237-x.
9
Enhancing Spam Message Classification and Detection Using Transformer-Based Embedding and Ensemble Learning.利用基于转换器的嵌入和集成学习增强垃圾邮件消息分类和检测。
Sensors (Basel). 2023 Apr 10;23(8):3861. doi: 10.3390/s23083861.
10
Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia-Ukraine War.使用关于俄乌战争的推特数据对监督式机器学习模型进行情感分析与综合评估
SN Comput Sci. 2023;4(4):346. doi: 10.1007/s42979-023-01790-5. Epub 2023 Apr 21.

本文引用的文献

1
The COVID-19 scamdemic: A survey of phishing attacks and their countermeasures during COVID-19.新冠疫情骗局:新冠疫情期间网络钓鱼攻击及其应对措施调查
IET Inf Secur. 2022 Sep;16(5):324-345. doi: 10.1049/ise2.12073. Epub 2022 Jul 4.
2
Implementation of 'Smishing Detector': An Efficient Model for Smishing Detection Using Neural Network.“网络钓鱼诈骗检测器”的实现:一种使用神经网络进行网络钓鱼诈骗检测的高效模型
SN Comput Sci. 2022;3(3):189. doi: 10.1007/s42979-022-01078-0. Epub 2022 Mar 15.
3
Principal component analysis: a review and recent developments.
主成分分析:综述与最新进展
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.