ETSI Telecomunicación, Universidad Politécnica de Madrid (UPM), Avda, Complutense 30, 28040 Madrid, Spain.
Sensors (Basel). 2021 Jan 19;21(2):656. doi: 10.3390/s21020656.
Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.
物联网网络的安全性目前是强制性的,因为需要处理大量的数据。这些系统容易受到多种网络安全攻击的影响,而且这些攻击的数量和复杂性都在不断增加。由于这个原因,必须开发新的入侵检测技术,对于这些场景尽可能地准确。基于机器学习算法的入侵检测系统已经在准确性方面表现出了很高的性能。本研究提出了研究和评估几种基于流量分类的预处理技术,用于机器学习神经网络算法。本研究使用了两个基准数据集,即 UGR16 和 UNSW-NB15,以及一个最常用的数据集 KDD99,对其进行评估。预处理技术是根据标量和归一化函数进行评估的。所有这些预处理模型都是通过基于四个特征组的不同特征集来应用的:基本连接特征、内容特征、统计特征,最后一组是由基于流量的特征和基于连接方向的流量特征组成。本研究的目的是通过使用各种数据预处理技术来评估这种分类,以获得最准确的模型。我们的建议表明,通过应用网络流量分类和几种预处理技术,可以将准确性提高高达 45%。对特定特征组的预处理可以提高准确性,使机器学习算法能够正确地对与可能的攻击相关的这些参数进行分类。