Das Guptta Sumitra, Shahriar Khandaker Tayef, Alqahtani Hamed, Alsalman Dheyaaldin, Sarker Iqbal H
Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, Chittagong, 4349 Bangladesh.
Unit of Cybersecurity, Department of Computer Science, Center of Artificial Intelligence, King Khalid University, Abha, Saudi Arabia.
Ann Data Sci. 2022 Mar 21:1-26. doi: 10.1007/s40745-022-00379-8.
In this paper, we mainly present a machine learning based approach to detect websites by taking into account to achieve high accuracy without relying on any third-party systems. In phishing, the attackers typically try to deceive internet users by masking a webpage as an official genuine webpage to steal sensitive information such as usernames, passwords, social security numbers, credit card information, etc. Anti-phishing solutions like blacklist or whitelist, heuristic, and visual similarity based methods cannot detect zero-hour phishing attacks or brand-new websites. Moreover, earlier approaches are complex and unsuitable for real-time environments due to the dependency on third-party sources, such as a search engine. Hence, detecting recently developed phishing websites in a real-time environment is a great in the domain of cybersecurity. To overcome these problems, this paper proposes a strategy that extracts features from URL and hyperlink information of client-side only. We also develop a new dataset for the purpose of conducting experiments using popular machine learning classification techniques. Our experimental result shows that the proposed phishing detection approach is more effective having higher detection accuracy of 99.17% with the XG Boost technique than traditional approaches.
在本文中,我们主要提出一种基于机器学习的方法来检测网站,该方法通过考虑在不依赖任何第三方系统的情况下实现高精度。在网络钓鱼中,攻击者通常试图通过将网页伪装成官方真实网页来欺骗互联网用户,以窃取用户名、密码、社会安全号码、信用卡信息等敏感信息。诸如黑名单或白名单、启发式以及基于视觉相似性的反网络钓鱼解决方案无法检测零时差网络钓鱼攻击或全新网站。此外,早期方法由于依赖第三方来源(如搜索引擎)而复杂且不适用于实时环境。因此,在实时环境中检测最近开发的网络钓鱼网站是网络安全领域的一大挑战。为克服这些问题,本文提出一种仅从客户端的URL和超链接信息中提取特征的策略。我们还为使用流行的机器学习分类技术进行实验开发了一个新的数据集。我们的实验结果表明,所提出的网络钓鱼检测方法比传统方法更有效,使用XG Boost技术时检测准确率高达99.17%。