端到端的私有域名生成算法检测即服务框架。

An end-to-end framework for private DGA detection as a service.

机构信息

Department of Computer Science, University of Brasilia, Federal District, Brasília, Brazil.

School of Engineering, University of Washington Tacoma, Tacoma, Washington, United States of America.

出版信息

PLoS One. 2024 Aug 28;19(8):e0304476. doi: 10.1371/journal.pone.0304476. eCollection 2024.

DOI:10.1371/journal.pone.0304476

PMID:39196905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11355532/

Abstract

Domain Generation Algorithms (DGAs) are used by malware to generate pseudorandom domain names to establish communication between infected bots and command and control servers. While DGAs can be detected by machine learning (ML) models with great accuracy, offering DGA detection as a service raises privacy concerns when requiring network administrators to disclose their DNS traffic to the service provider. The main scientific contribution of this paper is to propose the first end-to-end framework for privacy-preserving classification as a service of domain names into DGA (malicious) or non-DGA (benign) domains. Our framework achieves these goals by carefully designed protocols that combine two privacy-enhancing technologies (PETs), namely secure multi-party computation (MPC) and differential privacy (DP). Through MPC, our framework enables an enterprise network administrator to outsource the problem of classifying a DNS (Domain Name System) domain as DGA or non-DGA to an external organization without revealing any information about the domain name. Moreover, the service provider's ML model used for DGA detection is never revealed to the network administrator. Furthermore, by using DP, we also ensure that the classification result cannot be used to learn information about individual entries of the training data. Finally, we leverage post-training float16 quantization of deep learning models in MPC to achieve efficient, secure DGA detection. We demonstrate that by using quantization achieves a significant speed-up, resulting in a 23% to 42% reduction in inference runtime without reducing accuracy using a three party secure computation protocol tolerating one corruption. Previous solutions are not end-to-end private, do not provide differential privacy guarantees for the model's outputs, and assume that model embeddings are publicly known. Our best protocol in terms of accuracy runs in about 0.22s.

摘要

域名生成算法 (DGA) 被恶意软件用于生成伪随机域名，以建立受感染的机器人与命令和控制服务器之间的通信。虽然机器学习 (ML) 模型可以非常准确地检测到 DGAs，但当要求网络管理员将其 DNS 流量披露给服务提供商时，提供 DGA 检测即服务会引发隐私问题。本文的主要科学贡献是提出了第一个端到端框架，用于将域名的隐私保护分类即服务分类为 DGA（恶意）或非 DGA（良性）域。我们的框架通过精心设计的协议实现了这些目标，这些协议结合了两种隐私增强技术 (PET)，即安全多方计算 (MPC) 和差分隐私 (DP)。通过 MPC，我们的框架使企业网络管理员能够将 DNS（域名系统）域名分类为 DGA 或非 DGA 的问题外包给外部组织，而无需透露任何有关域名的信息。此外，用于 DGA 检测的服务提供商的 ML 模型从未向网络管理员透露。此外，通过使用 DP，我们还确保分类结果不能用于学习有关训练数据中各个条目的信息。最后，我们利用 MPC 中的深度学习模型后训练 float16 量化来实现高效、安全的 DGA 检测。我们证明，通过使用量化可以实现显著的加速，在不降低准确性的情况下，使用容忍一个损坏的三方安全计算协议将推理运行时减少 23%至 42%。以前的解决方案不是端到端的隐私保护，不为模型的输出提供差分隐私保证，并且假设模型嵌入是公开的。我们在准确性方面的最佳协议在大约 0.22 秒内运行。