Čenys Antanas, Hora Simran Kaur, Goranin Nikolaj
Department of Information Systems, Vilnius Gediminas Technical University, 10223 Vilnius, Lithuania.
Sensors (Basel). 2025 Aug 15;25(16):5077. doi: 10.3390/s25165077.
Due to the rapid expansion of Internet of Things devices and their associated network, security has become a critical concern, necessitating the development of reliable security mechanisms. Anomaly-based NIDS leveraging machine learning and deep learning have emerged as key solutions in detecting abnormal network traffic patterns. However, one challenge that affects the detection rate of machine learning or deep learning-based anomaly NIDS is the class data imbalance present in the existing dataset. Datasets are crucial for the development and evaluation of anomaly-based NIDS for IoT systems. In this study, we introduce EmuIoT-VT, a dataset generated by creating virtual replicas of IoT devices implementing a novel emulation-based method, enabling realistic network traffic generation without relying on any external network emulators. The data was collected in an isolated offline environment to capture clean, uncontaminated network traffic. The EmuIoT-VT is balanced-by-design, containing 28,000 labeled records that are evenly distributed across devices, classes, and subclasses, and supports both binary and multiclass classification tasks. It includes 82 features extracted from raw PCAP data and includes attack categories such as DoS, brute force, reconnaissance, and exploitation. This article presents the novel method and creation of the EmuIoT-VT dataset, detailing data collection, balancing strategy, and details of the dataset structure, and proposes directions for future work.
由于物联网设备及其相关网络的迅速扩张,安全已成为一个关键问题,这就需要开发可靠的安全机制。基于异常的网络入侵检测系统(NIDS)利用机器学习和深度学习,已成为检测异常网络流量模式的关键解决方案。然而,影响基于机器学习或深度学习的异常NIDS检测率的一个挑战是现有数据集中存在的类数据不平衡问题。数据集对于物联网系统中基于异常的NIDS的开发和评估至关重要。在本研究中,我们引入了EmuIoT-VT,这是一个通过创建物联网设备的虚拟副本生成的数据集,采用了一种新颖的基于仿真的方法,无需依赖任何外部网络仿真器就能生成逼真的网络流量。数据是在隔离的离线环境中收集的,以捕获干净、未受污染的网络流量。EmuIoT-VT在设计上是平衡的,包含28000条带标签的记录,这些记录在设备、类和子类之间均匀分布,并支持二分类和多分类任务。它包括从原始PCAP数据中提取的82个特征,包括拒绝服务(DoS)、暴力破解、侦察和利用等攻击类别。本文介绍了EmuIoT-VT数据集的新颖方法和创建过程,详细说明了数据收集、平衡策略以及数据集结构的细节,并提出了未来工作的方向。