Feng Jian, Qiao Yuqiang, Ye Ou, Zhang Ying
College of Computer Science & Technology, Xi'an University of Science and Technology, Xi'an, Shaanxi, China.
Information Technology Department for Head Office of SPD Bank, National Institute of Standards and Technology Application Development Service Sub-centre (Xi'an), Xi'an, Shaanxi, China.
PeerJ Comput Sci. 2022 Feb 1;8:e868. doi: 10.7717/peerj-cs.868. eCollection 2022.
Phishing webpages are often generated by phishing kits or evolved from existing kits. Therefore, the homology analysis of phishing webpages can help curb the proliferation of phishing webpages from the source. Based on the observation that phishing webpages belonging to the same family have similar page structures, a homology detection method based on webpage clustering according to structural similarity is proposed. The method consists of two stages. The first stage realizes model construction. Firstly, it extracts the structural features and style attributes of webpages through the document structure and vectorizes them, and then assigns different weights to different features, and measures the similarity of webpages and guides webpage clustering by webpage difference index. The second phase completes the detection of webpages to be tested. The fingerprint generation algorithm using double compressions generates fingerprints for the centres of the clusters and the webpages to be tested respectively and accelerates the detection process of the webpages to be tested through bitwise comparison. Experiments show that, compared with the existing methods, the proposed method can accurately locate the family of phishing webpages and can detect phishing webpages efficiently.
网络钓鱼网页通常由网络钓鱼工具包生成或从现有工具包演变而来。因此,对网络钓鱼网页进行同源性分析有助于从源头上遏制网络钓鱼网页的扩散。基于属于同一家族的网络钓鱼网页具有相似页面结构这一观察结果,提出了一种基于结构相似性的网页聚类同源性检测方法。该方法包括两个阶段。第一阶段实现模型构建。首先,通过文档结构提取网页的结构特征和样式属性并将其向量化,然后为不同特征赋予不同权重,并通过网页差异指数衡量网页的相似性并指导网页聚类。第二阶段完成对待测试网页的检测。使用双重压缩的指纹生成算法分别为聚类中心和待测试网页生成指纹,并通过按位比较加速待测试网页的检测过程。实验表明,与现有方法相比,该方法能够准确定位网络钓鱼网页的家族,并且能够高效地检测网络钓鱼网页。