基于文本和视觉内容的反网络钓鱼：一种贝叶斯方法。

Textual and visual content-based anti-phishing: a Bayesian approach.

作者信息

Zhang Haijun, Liu Gang, Chow Tommy W S, Liu Wenyin

机构信息

Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

IEEE Trans Neural Netw. 2011 Oct;22(10):1532-46. doi: 10.1109/TNN.2011.2161999. Epub 2011 Aug 4.

DOI:10.1109/TNN.2011.2161999

PMID:21824844

Abstract

A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases.

摘要

提出了一种使用贝叶斯方法进行基于内容的网络钓鱼网页检测的新颖框架。我们的模型考虑文本和视觉内容来衡量受保护网页与可疑网页之间的相似度。介绍了一个文本分类器、一个图像分类器以及融合分类器结果的算法。本文的一个突出特点是探索了一种贝叶斯模型来估计匹配阈值。这在用于确定网页类别并识别该网页是否为网络钓鱼的分类器中是必需的。在文本分类器中，朴素贝叶斯规则用于计算网页是网络钓鱼的概率。在图像分类器中，使用推土机距离来衡量视觉相似度，并且我们的贝叶斯模型用于确定阈值。在数据融合算法中，贝叶斯理论用于综合来自文本和视觉内容的分类结果。我们提出的方法的有效性在从真实网络钓鱼案例收集的大规模数据集中进行了检验。实验结果表明，我们设计的文本分类器和图像分类器都给出了有前景的结果，融合算法优于任何一个单独的分类器，并且我们的模型可以适应不同的网络钓鱼案例。

相似文献

Textual and visual content-based anti-phishing: a Bayesian approach.基于文本和视觉内容的反网络钓鱼：一种贝叶斯方法。

IEEE Trans Neural Netw. 2011 Oct;22(10):1532-46. doi: 10.1109/TNN.2011.2161999. Epub 2011 Aug 4.

Recognition of pornographic web pages by classifying texts and images.通过对文本和图像进行分类来识别色情网页。

IEEE Trans Pattern Anal Mach Intell. 2007 Jun;29(6):1019-34. doi: 10.1109/TPAMI.2007.1133.

Learning image similarity from Flickr groups using fast kernel machines.使用快速核机器从 Flickr 群组中学习图像相似度。

IEEE Trans Pattern Anal Mach Intell. 2012 Nov;34(11):2177-88. doi: 10.1109/TPAMI.2012.29.

A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays.一种用于处理分类问题中样本异质性的分层朴素贝叶斯模型：在组织微阵列中的应用。

BMC Bioinformatics. 2006 Nov 24;7:514. doi: 10.1186/1471-2105-7-514.

A Bayesian approach to joint feature selection and classifier design.一种用于联合特征选择和分类器设计的贝叶斯方法。

IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1105-11. doi: 10.1109/TPAMI.2004.55.

Optimal classifier fusion in a non-bayesian probabilistic framework.非贝叶斯概率框架下的最优分类器融合

IEEE Trans Pattern Anal Mach Intell. 2009 Sep;31(9):1630-44. doi: 10.1109/TPAMI.2008.224.

Bayesian Gaussian process classification with the EM-EP algorithm.基于期望最大化-期望传播（EM-EP）算法的贝叶斯高斯过程分类

IEEE Trans Pattern Anal Mach Intell. 2006 Dec;28(12):1948-59. doi: 10.1109/TPAMI.2006.238.

Continuous time Bayesian network classifiers.连续时间贝叶斯网络分类器。

J Biomed Inform. 2012 Dec;45(6):1108-19. doi: 10.1016/j.jbi.2012.07.002. Epub 2012 Jul 28.

Is Domain Highlighting Actually Helpful in Identifying Phishing Web Pages?域名突出显示在识别网络钓鱼网页方面真的有帮助吗？

Hum Factors. 2017 Jun;59(4):640-660. doi: 10.1177/0018720816684064. Epub 2017 Jan 6.

Novel layered clustering-based approach for generating ensemble of classifiers.基于分层聚类的新型分类器集成生成方法。

IEEE Trans Neural Netw. 2011 May;22(5):781-92. doi: 10.1109/TNN.2011.2118765. Epub 2011 Apr 11.

引用本文的文献

A hybrid DNN-LSTM model for detecting phishing URLs.一种用于检测网络钓鱼网址的深度神经网络与长短期记忆网络混合模型。

Neural Comput Appl. 2023;35(7):4957-4973. doi: 10.1007/s00521-021-06401-z. Epub 2021 Aug 8.

Data mining and machine learning approaches for prediction modelling of disease vectors: Epidemic disease prediction modelling.用于病媒预测建模的数据挖掘和机器学习方法：流行病预测建模。

Int J Mach Learn Cybern. 2020;11(6):1159-1178. doi: 10.1007/s13042-019-01029-x. Epub 2019 Nov 18.

Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy.使用系统文献挖掘策略对肝细胞癌进行生物标志物鉴定。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax082.

Biomarker identification using text mining.使用文本挖掘进行生物标志物识别。

Comput Math Methods Med. 2012;2012:135780. doi: 10.1155/2012/135780. Epub 2012 Nov 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于文本和视觉内容的反网络钓鱼：一种贝叶斯方法。

Textual and visual content-based anti-phishing: a Bayesian approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献