Catal Cagatay, Giray Görkem, Tekinerdogan Bedir, Kumar Sandeep, Shukla Suyash
Department of Computer Science and Engineering, Qatar University, Doha, Qatar.
Independent Researcher, Izmir, Turkey.
Knowl Inf Syst. 2022;64(6):1457-1500. doi: 10.1007/s10115-022-01672-x. Epub 2022 May 23.
Phishing attacks aim to steal confidential information using sophisticated methods, techniques, and tools such as phishing through content injection, social engineering, online social networks, and mobile applications. To avoid and mitigate the risks of these attacks, several phishing detection approaches were developed, among which deep learning algorithms provided promising results. However, the results and the corresponding lessons learned are fragmented over many different studies and there is a lack of a systematic overview of the use of deep learning algorithms in phishing detection. Hence, we performed a systematic literature review (SLR) to identify, assess, and synthesize the results on deep learning approaches for phishing detection as reported by the selected scientific publications. We address nine research questions and provide an overview of how deep learning algorithms have been used for phishing detection from several aspects. In total, 43 journal articles were selected from electronic databases to derive the answers for the defined research questions. Our SLR study shows that except for one study, all the provided models applied supervised deep learning algorithms. The widely used data sources were URL-related data, third party information on the website, website content-related data, and email. The most used deep learning algorithms were deep neural networks (DNN), convolutional neural networks, and recurrent neural networks/long short-term memory networks. DNN and hybrid deep learning algorithms provided the best performance among other deep learning-based algorithms. 72% of the studies did not apply any feature selection algorithm to build the prediction model. PhishTank was the most used dataset among other datasets. While Keras and Tensorflow were the most preferred deep learning frameworks, 46% of the articles did not mention any framework. This study also highlights several challenges for phishing detection to pave the way for further research.
网络钓鱼攻击旨在通过复杂的方法、技术和工具窃取机密信息,如内容注入式网络钓鱼、社会工程、在线社交网络和移动应用程序。为了避免和减轻这些攻击的风险,人们开发了几种网络钓鱼检测方法,其中深度学习算法取得了不错的成果。然而,这些成果以及相应的经验教训分散在许多不同的研究中,并且缺乏对深度学习算法在网络钓鱼检测中应用的系统概述。因此,我们进行了一项系统文献综述(SLR),以识别、评估和综合所选科学出版物中报道的关于网络钓鱼检测深度学习方法的结果。我们提出了九个研究问题,并从几个方面概述了深度学习算法是如何用于网络钓鱼检测的。总共从电子数据库中选择了43篇期刊文章,以得出针对所定义研究问题的答案。我们的SLR研究表明,除了一项研究外,所有提供的模型都应用了监督式深度学习算法。广泛使用的数据源是与URL相关的数据、网站上的第三方信息、与网站内容相关的数据和电子邮件。使用最多的深度学习算法是深度神经网络(DNN)、卷积神经网络和循环神经网络/长短期记忆网络。在其他基于深度学习的算法中,DNN和混合深度学习算法表现最佳。72%的研究在构建预测模型时未应用任何特征选择算法。PhishTank是其他数据集中使用最多的数据集。虽然Keras和TensorFlow是最受欢迎的深度学习框架,但46%的文章未提及任何框架。本研究还强调了网络钓鱼检测的几个挑战,为进一步研究铺平道路。