Sonowal Gunikhan
Department of Computer Science, Pondicherry University, Puducherry, India.
SN Comput Sci. 2020;1(4):191. doi: 10.1007/s42979-020-00194-z. Epub 2020 Jun 6.
Phishing has appeared as a critical issue in the cybersecurity domain. Phishers adopt email as one of their major channels of communication to lure potential victims. This paper attempts to detect phishing emails by using binary search feature selection (BSFS) with a Pearson correlation coefficient algorithm as a ranking method. The proposed method utilizes four sets of features from the email subject, the body of the email, hyperlinks, and readability of contents. Overall, 41 features were selected from the aforementioned four dimensions. The result shows that the BSFS method evaluated the accuracy of 97.41% in comparison with SFFS (95.63%) and WFS (95.56%). This exploration shows that the SFFS requires more time to ascertain the optimum features set and the WFS requires the least time; however, the accuracy of WFS is very low in comparison with other algorithms. The significant finding of the experiment is that the BFSF requires the least time to evaluate the best feature set with better accuracy even though few features are removed from the feature corpus.
网络钓鱼已成为网络安全领域的一个关键问题。网络钓鱼者将电子邮件作为其主要通信渠道之一,以诱骗潜在受害者。本文尝试使用二元搜索特征选择(BSFS)并结合皮尔逊相关系数算法作为排序方法来检测网络钓鱼电子邮件。所提出的方法利用了来自电子邮件主题、邮件正文、超链接和内容可读性的四组特征。总体而言,从上述四个维度中选择了41个特征。结果表明,与顺序前向选择(SFFS,准确率95.63%)和宽度优先搜索(WFS,准确率95.56%)相比,BSFS方法的评估准确率为97.41%。该探索表明,SFFS需要更多时间来确定最优特征集,而WFS所需时间最少;然而,与其他算法相比,WFS的准确率非常低。该实验的重要发现是,尽管从特征库中移除了一些特征,但BFSF评估最佳特征集所需时间最少,且准确率更高。