Faculty of Engineering and Architecture, Universidad de Lima, Avenida Javier Prado Este, 4600 Lima 33, Peru.
Indra, Digital Labs, Av. de Bruselas, 35, Alcobendas, 28108 Madrid, Spain.
Sensors (Basel). 2020 Aug 12;20(16):4501. doi: 10.3390/s20164501.
Botnets are some of the most recurrent cyber-threats, which take advantage of the wide heterogeneity of endpoint devices at the Edge of the emerging communication environments for enabling the malicious enforcement of fraud and other adversarial tactics, including malware, data leaks or denial of service. There have been significant research advances in the development of accurate botnet detection methods underpinned on supervised analysis but assessing the accuracy and performance of such detection methods requires a clear evaluation model in the pursuit of enforcing proper defensive strategies. In order to contribute to the mitigation of botnets, this paper introduces a novel evaluation scheme grounded on supervised machine learning algorithms that enable the detection and discrimination of different botnets families on real operational environments. The proposal relies on observing, understanding and inferring the behavior of each botnet family based on network indicators measured at flow-level. The assumed evaluation methodology contemplates six phases that allow building a detection model against botnet-related malware distributed through the network, for which five supervised classifiers were instantiated were instantiated for further comparisons-Decision Tree, Random Forest, Naive Bayes Gaussian, Support Vector Machine and K-Neighbors. The experimental validation was performed on two public datasets of real botnet traffic-CIC-AWS-2018 and ISOT HTTP Botnet. Bearing the heterogeneity of the datasets, optimizing the analysis with the Grid Search algorithm led to improve the classification results of the instantiated algorithms. An exhaustive evaluation was carried out demonstrating the adequateness of our proposal which prompted that Random Forest and Decision Tree models are the most suitable for detecting different botnet specimens among the chosen algorithms. They exhibited higher precision rates whilst analyzing a large number of samples with less processing time. The variety of testing scenarios were deeply assessed and reported to set baseline results for future benchmark analysis targeted on flow-based behavioral patterns.
僵尸网络是最常见的网络威胁之一,它们利用新兴通信环境边缘处的终端设备的广泛异构性,来实施恶意欺诈和其他敌对策略,包括恶意软件、数据泄露或拒绝服务。在基于监督分析的精确僵尸网络检测方法的开发方面已经取得了重大研究进展,但评估此类检测方法的准确性和性能需要在追求实施适当防御策略的情况下,采用明确的评估模型。为了有助于缓解僵尸网络的威胁,本文提出了一种基于监督机器学习算法的新型评估方案,该方案能够在真实的操作环境中检测和区分不同的僵尸网络家族。该提案依赖于基于在流级别测量的网络指标观察、理解和推断每个僵尸网络家族的行为。所采用的评估方法包括六个阶段,允许在网络中构建针对与僵尸网络相关的恶意软件的检测模型,为此针对五个监督分类器进行了实例化,以便进一步比较——决策树、随机森林、朴素贝叶斯高斯、支持向量机和 K-最近邻。在两个公共的真实僵尸网络流量数据集(CIC-AWS-2018 和 ISOT HTTP Botnet)上进行了实验验证。考虑到数据集的异构性,使用 Grid Search 算法优化分析可以提高实例化算法的分类结果。进行了详尽的评估,证明了我们的提案是合适的,随机森林和决策树模型是在所选择的算法中检测不同僵尸网络样本最适合的。它们在分析大量样本时表现出更高的准确率,同时处理时间更少。还深入评估和报告了各种测试场景,为基于流的行为模式的未来基准分析设定了基线结果。