Rao Routhu Srinivasa, Kondaiah Cheemaladinne, Pais Alwyn Roshan, Lee Bumshik
Department of Computer Science and Engineering, Gandhi Institute of Technology and Management, Visakhapatnam, Andhra Pradesh, 530045, India.
Department of Information and Communication Engineering, Chosun University, Gwangju, 61452, Republic of Korea.
Sci Rep. 2025 May 15;15(1):16839. doi: 10.1038/s41598-025-02009-8.
In today's digital age, the rapid increase in online users and massive network traffic has made ensuring security more challenging. Among the various cyber threats, phishing remains one of the most significant. Phishing is a cyberattack in which attackers steal sensitive information, such as usernames, passwords, and credit card details, through fake web pages designed to mimic legitimate websites. These attacks primarily occur via emails or websites. Several antiphishing techniques, such as blacklist-based, source code analysis, and visual similarity-based methods, have been developed to counter phishing websites. However, these methods have specific limitations, including vulnerability to zero-day attacks, susceptibility to drive-by-downloads, and high detection latency. Furthermore, many of these techniques are unsuitable for mobile devices, which face additional constraints, such as limited RAM, smaller screen sizes, and lower computational power. To address these limitations, this paper proposes a novel hybrid super learner ensemble model named Phish-Jam, a mobile application specifically designed for phishing detection on mobile devices. Phish-Jam utilizes a super learner ensemble that combines predictions from diverse Machine Learning (ML) algorithms to classify legitimate and phishing websites. By focusing on extracting features from URLs, including handcrafted features, transformer-based text embeddings, and other Deep Learning (DL) architectures, the proposed model offers several advantages: fast computation, language independence, and robustness against accidental malware downloads. From the experimental analysis, it is observed that the super learner ensemble achieved significant accuracy of 98.93%, precision of 99.15%, MCC of 97.81% and F1 Score of 99.07%.
在当今数字时代,在线用户的迅速增加和海量网络流量使得确保安全变得更具挑战性。在各种网络威胁中,网络钓鱼仍然是最严重的威胁之一。网络钓鱼是一种网络攻击,攻击者通过设计模仿合法网站的虚假网页窃取敏感信息,如用户名、密码和信用卡详细信息。这些攻击主要通过电子邮件或网站发生。已经开发了几种反网络钓鱼技术,如基于黑名单、源代码分析和基于视觉相似性的方法来对抗网络钓鱼网站。然而,这些方法有特定的局限性,包括易受零日攻击、易受驱动下载影响以及检测延迟高。此外,这些技术中的许多不适用于移动设备,移动设备面临额外的限制,如有限的随机存取存储器、较小的屏幕尺寸和较低的计算能力。为了解决这些局限性,本文提出了一种名为Phish-Jam的新型混合超级学习器集成模型,这是一款专门为移动设备上的网络钓鱼检测而设计的移动应用程序。Phish-Jam利用一个超级学习器集成,该集成结合了来自不同机器学习(ML)算法的预测,以对合法网站和网络钓鱼网站进行分类。通过专注于从统一资源定位符(URL)中提取特征,包括手工制作的特征、基于变换器的文本嵌入和其他深度学习(DL)架构,所提出的模型具有几个优点:计算速度快、语言独立性以及对意外恶意软件下载的鲁棒性。从实验分析中可以观察到,超级学习器集成实现了98.93%的显著准确率、99.15%的精确率、97.81%的马修斯相关系数(MCC)和99.07%的F1分数。