Ozcan Alper, Catal Cagatay, Donmez Emrah, Senturk Behcet
Department of Computer Engineering, Nisantasi University, Istanbul, Turkey.
Department of Computer Engineering, Akdeniz University, Antalya, Turkey.
Neural Comput Appl. 2023;35(7):4957-4973. doi: 10.1007/s00521-021-06401-z. Epub 2021 Aug 8.
Phishing is an attack targeting to imitate the official websites of corporations such as banks, e-commerce, financial institutions, and governmental institutions. Phishing websites aim to access and retrieve users' important information such as personal identification, social security number, password, e-mail, credit card, and other account information. Several anti-phishing techniques have been developed to cope with the increasing number of phishing attacks so far. Machine learning and particularly, deep learning algorithms are nowadays the most crucial techniques used to detect and prevent phishing attacks because of their strong learning abilities on massive datasets and their state-of-the-art results in many classification problems. Previously, two types of feature extraction techniques [i.e., character embedding-based and manual natural language processing (NLP) feature extraction] were used in isolation. However, researchers did not consolidate these features and therefore, the performance was not remarkable. Unlike previous works, our study presented an approach that utilizes both feature extraction techniques. We discussed how to combine these feature extraction techniques to fully utilize from the available data. This paper proposes hybrid deep learning models based on long short-term memory and deep neural network algorithms for detecting phishing uniform resource locator and evaluates the performance of the models on phishing datasets. The proposed hybrid deep learning models utilize both character embedding and NLP features, thereby simultaneously exploiting deep connections between characters and revealing NLP-based high-level connections. Experimental results showed that the proposed models achieve superior performance than the other phishing detection models in terms of accuracy metric.
网络钓鱼是一种旨在模仿银行、电子商务、金融机构和政府机构等公司官方网站的攻击行为。网络钓鱼网站旨在获取用户的重要信息,如个人身份识别、社会保障号码、密码、电子邮件、信用卡和其他账户信息。到目前为止,已经开发了几种反网络钓鱼技术来应对日益增多的网络钓鱼攻击。机器学习,尤其是深度学习算法,由于其在海量数据集上的强大学习能力以及在许多分类问题上的领先成果,如今已成为检测和防范网络钓鱼攻击的最关键技术。以前,两种类型的特征提取技术[即基于字符嵌入和手动自然语言处理(NLP)特征提取]是单独使用的。然而,研究人员没有整合这些特征,因此性能并不显著。与以前的工作不同,我们的研究提出了一种同时利用这两种特征提取技术的方法。我们讨论了如何结合这些特征提取技术以充分利用可用数据。本文提出了基于长短期记忆和深度神经网络算法的混合深度学习模型来检测网络钓鱼统一资源定位器,并评估这些模型在网络钓鱼数据集上的性能。所提出的混合深度学习模型同时利用字符嵌入和NLP特征,从而同时利用字符之间的深度连接并揭示基于NLP的高级连接。实验结果表明,所提出的模型在准确率指标方面比其他网络钓鱼检测模型具有更优的性能。