Venturi Andrea, Apruzzese Giovanni, Andreolini Mauro, Colajanni Michele, Marchetti Mirco
Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Italy.
Hilti Chair of Data and Application Security, University of Liechtenstein, Vaduz, Liechtenstein.
Data Brief. 2020 Dec 8;34:106631. doi: 10.1016/j.dib.2020.106631. eCollection 2021 Feb.
We present the first dataset that aims to serve as a benchmark to validate the resilience of botnet detectors against adversarial attacks. This dataset includes realistic adversarial samples that are generated by leveraging two widely used Deep Reinforcement Learning (DRL) techniques. These adversarial samples are proved to evade state of the art detectors based on Machine- and Deep-Learning algorithms. The initial corpus of malicious samples consists of network flows belonging to different botnet families presented in three public datasets containing real enterprise network traffic. We use these datasets to devise detectors capable of achieving state-of-the-art performance. We then train two DRL agents, based on and , to generate realistic adversarial samples: the goal is achieving misclassifications by performing small modifications to the initial malicious samples. These alterations involve the features that can be more realistically altered by an expert attacker, and do not compromise the underlying malicious logic of the original samples. Our dataset represents an important contribution to the cybersecurity research community as it is the first including thousands of automatically generated adversarial samples that are able to thwart state of the art classifiers with a high evasion rate. The adversarial samples are grouped by malware variant and provided in a CSV file format. Researchers can validate their defensive proposals by testing their detectors against the adversarial samples of the proposed dataset. Moreover, the analysis of these samples can pave the way to a deeper comprehension of adversarial attacks and to some sort of explainability of machine learning defensive algorithms. They can also support the definition of novel effective defensive techniques.
我们展示了首个数据集,旨在作为一个基准,以验证僵尸网络检测器抵御对抗性攻击的能力。该数据集包含通过利用两种广泛使用的深度强化学习(DRL)技术生成的逼真的对抗性样本。事实证明,这些对抗性样本能够躲避基于机器学习和深度学习算法的先进检测器。恶意样本的初始语料库由属于不同僵尸网络家族的网络流组成,这些网络流来自三个包含真实企业网络流量的公共数据集。我们使用这些数据集来设计能够实现先进性能的检测器。然后,我们基于[具体算法1]和[具体算法2]训练两个DRL智能体,以生成逼真的对抗性样本:目标是通过对初始恶意样本进行微小修改来实现误分类。这些改变涉及专家攻击者能够更现实地改变的特征,并且不会损害原始样本的潜在恶意逻辑。我们的数据集对网络安全研究界做出了重要贡献,因为它是首个包含数千个自动生成的对抗性样本的数据集,这些样本能够以高逃避率挫败先进的分类器。对抗性样本按恶意软件变体分组,并以CSV文件格式提供。研究人员可以通过针对所提议数据集的对抗性样本测试他们的检测器来验证他们的防御方案。此外,对这些样本的分析可以为更深入理解对抗性攻击以及某种程度上解释机器学习防御算法铺平道路。它们还可以支持定义新颖有效的防御技术。