联邦学习在钓鱼邮件检测中的评估。

Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia.

School of Chemical Engineering, The University of New South Wales, Sydney 2052, Australia.

Sensors (Basel). 2023 Apr 27;23(9):4346. doi: 10.3390/s23094346.

The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensitive information. Consequently, it has been difficult to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly federated learning (FL), is a desideratum. As it is already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein was the first to investigate the use of FL in phishing email detection. This study focused on building upon a deep neural network model, particularly recurrent convolutional neural network (RNN) and bidirectional encoder representations from transformers (BERT), for phishing email detection. We analyzed the FL-entangled learning performance in various settings, including (i) a balanced and asymmetrical data distribution among organizations and (ii) scalability. Our results corroborated the comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets and low organizational counts. Moreover, we observed a variation in performance when increasing the organizational counts. For a fixed total email dataset, the global RNN-based model had a 1.8% accuracy decrease when the organizational counts were increased from 2 to 10. In contrast, BERT accuracy increased by 0.6% when increasing organizational counts from 2 to 5. However, if we increased the overall email dataset by introducing new organizations in the FL framework, the organizational level performance improved by achieving a faster convergence speed. In addition, FL suffered in its overall global model performance due to highly unstable outputs if the email dataset distribution was highly asymmetric.

人工智能（AI）用于检测网络钓鱼电子邮件主要依赖于大规模集中式数据集，这使其面临着众多隐私、信任和法律问题。此外，由于担心泄露商业敏感信息，组织不愿共享电子邮件。因此，很难获得足够的电子邮件来有效地训练全球 AI 模型。因此，需要使用隐私保护的分布式和协作机器学习，特别是联邦学习（FL）。由于它已经在医疗保健领域得到广泛应用，因此在多组织协作的背景下，基于 FL 的网络钓鱼检测的有效性和功效仍然存在疑问。据我们所知，本文首次研究了在网络钓鱼电子邮件检测中使用 FL。本研究侧重于在深度神经网络模型（特别是递归卷积神经网络（RNN）和来自转换器的双向编码器表示（BERT））的基础上构建，用于网络钓鱼电子邮件检测。我们分析了 FL 纠缠学习在各种设置中的性能，包括（i）组织之间的平衡和非对称数据分布，以及（ii）可扩展性。我们的结果证实了 FL 在网络钓鱼电子邮件检测中的性能与集中式学习在平衡数据集和低组织计数方面的可比统计数据。此外，我们观察到随着组织数量的增加，性能会发生变化。对于固定的总电子邮件数据集，当组织数量从 2 增加到 10 时，基于 RNN 的全局模型的准确性下降了 1.8%。相比之下，当组织数量从 2 增加到 5 时，BERT 的准确性增加了 0.6%。但是，如果我们在 FL 框架中引入新的组织来增加总体电子邮件数据集，则可以通过实现更快的收敛速度来提高组织级别的性能。此外，如果电子邮件数据集分布高度不对称，FL 的整体全局模型性能会受到高度不稳定输出的影响。

相似文献

Evaluation of Federated Learning in Phishing Email Detection.

Sensors (Basel). 2023 Apr 27;23(9):4346. doi: 10.3390/s23094346.

Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models.

Sensors (Basel). 2024 Mar 24;24(7):2077. doi: 10.3390/s24072077.

Cloud-based email phishing attack using machine and deep learning algorithm.

Complex Intell Systems. 2023;9(3):3043-3070. doi: 10.1007/s40747-022-00760-3. Epub 2022 Jun 2.

Federated Learning in Glaucoma: A Comprehensive Review and Future Perspectives.

Ophthalmol Glaucoma. 2025 Jan-Feb;8(1):92-105. doi: 10.1016/j.ogla.2024.08.004. Epub 2024 Aug 29.

A Deep Learning-Based Innovative Technique for Phishing Detection in Modern Security with Uniform Resource Locators.

Sensors (Basel). 2023 Apr 30;23(9):4403. doi: 10.3390/s23094403.

Applications of deep learning for phishing detection: a systematic literature review.

Knowl Inf Syst. 2022;64(6):1457-1500. doi: 10.1007/s10115-022-01672-x. Epub 2022 May 23.

So Many Phish, So Little Time: Exploring Email Task Factors and Phishing Susceptibility.

Hum Factors. 2022 Dec;64(8):1379-1403. doi: 10.1177/0018720821999174. Epub 2021 Apr 9.

An intelligent cyber security phishing detection system using deep learning techniques.

Cluster Comput. 2022;25(6):3819-3828. doi: 10.1007/s10586-022-03604-4. Epub 2022 May 14.

Federated Learning for Privacy-Aware Human Mobility Modeling.

Front Artif Intell. 2022 Jun 28;5:867046. doi: 10.3389/frai.2022.867046. eCollection 2022.

It's the deceiver and the receiver: Individual differences in phishing susceptibility and false positives with item profiling.

PLoS One. 2018 Oct 26;13(10):e0205089. doi: 10.1371/journal.pone.0205089. eCollection 2018.

本文引用的文献

The future of digital health with federated learning.

NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.

Machine learning for email spam filtering: review, approaches and open research problems.

Heliyon. 2019 Jun 10;5(6):e01802. doi: 10.1016/j.heliyon.2019.e01802. eCollection 2019 Jun.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Evaluation of Federated Learning in Phishing Email Detection.

Sensors (Basel). 2023 Apr 27;23(9):4346. doi: 10.3390/s23094346.

Advancing Phishing Email Detection: A Comparative Study of Deep Learning Models.

Sensors (Basel). 2024 Mar 24;24(7):2077. doi: 10.3390/s24072077.

Cloud-based email phishing attack using machine and deep learning algorithm.

Complex Intell Systems. 2023;9(3):3043-3070. doi: 10.1007/s40747-022-00760-3. Epub 2022 Jun 2.

Federated Learning in Glaucoma: A Comprehensive Review and Future Perspectives.

Ophthalmol Glaucoma. 2025 Jan-Feb;8(1):92-105. doi: 10.1016/j.ogla.2024.08.004. Epub 2024 Aug 29.

A Deep Learning-Based Innovative Technique for Phishing Detection in Modern Security with Uniform Resource Locators.

Sensors (Basel). 2023 Apr 30;23(9):4403. doi: 10.3390/s23094403.

Applications of deep learning for phishing detection: a systematic literature review.

Knowl Inf Syst. 2022;64(6):1457-1500. doi: 10.1007/s10115-022-01672-x. Epub 2022 May 23.

So Many Phish, So Little Time: Exploring Email Task Factors and Phishing Susceptibility.

Hum Factors. 2022 Dec;64(8):1379-1403. doi: 10.1177/0018720821999174. Epub 2021 Apr 9.

An intelligent cyber security phishing detection system using deep learning techniques.

Cluster Comput. 2022;25(6):3819-3828. doi: 10.1007/s10586-022-03604-4. Epub 2022 May 14.

Federated Learning for Privacy-Aware Human Mobility Modeling.

Front Artif Intell. 2022 Jun 28;5:867046. doi: 10.3389/frai.2022.867046. eCollection 2022.

It's the deceiver and the receiver: Individual differences in phishing susceptibility and false positives with item profiling.

PLoS One. 2018 Oct 26;13(10):e0205089. doi: 10.1371/journal.pone.0205089. eCollection 2018.

本文引用的文献

The future of digital health with federated learning.

NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.

Machine learning for email spam filtering: review, approaches and open research problems.

Heliyon. 2019 Jun 10;5(6):e01802. doi: 10.1016/j.heliyon.2019.e01802. eCollection 2019 Jun.

Evaluation of Federated Learning in Phishing Email Detection.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献