如何有效地收集和处理用于入侵检测的网络数据？

How to Effectively Collect and Process Network Data for Intrusion Detection?

作者信息

Komisarek Mikołaj, Pawlicki Marek, Kozik Rafał, Hołubowicz Witold, Choraś Michał

机构信息

ITTI Sp. z o.o., Rubież 46, 61-612 Poznań, Poland.

Institute of Telecommunications and Computer Science, Bydgoszcz University of Science and Technology, 85-796 Bydgoszcz, Poland.

出版信息

Entropy (Basel). 2021 Nov 18;23(11):1532. doi: 10.3390/e23111532.

DOI:10.3390/e23111532

PMID:34828230

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8619486/

Abstract

The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-learning-based network intrusion detection is a promising direction for intrusion detection systems. However, many contemporary benchmark datasets do not contain features that are usable in the wild. The main contribution of this work is to cover the research gap related to identifying and investigating valuable features in the NetFlow schema that allow for effective, machine-learning-based network intrusion detection in the real world. To achieve this goal, several feature selection techniques have been applied on five flow-based network intrusion detection datasets, establishing an informative flow-based feature set. The authors' experience with the deployment of this kind of system shows that to close the research-to-market gap, and to perform actual real-world application of machine-learning-based intrusion detection, a set of labeled data from the end-user has to be collected. This research aims at establishing the appropriate, minimal amount of data that is sufficient to effectively train machine learning algorithms in intrusion detection. The results show that a set of 10 features and a small amount of data is enough for the final model to perform very well.

摘要

网络空间中安全漏洞的数量正在上升。这种威胁促使入侵检测研究领域展开了密集工作。为了使防御机制与时俱进且切实有效，需要现实的网络流量数据集。将基于流的数据用于基于机器学习的网络入侵检测，对入侵检测系统来说是一个很有前景的方向。然而，许多当代基准数据集并不包含在实际环境中可用的特征。这项工作的主要贡献在于填补了与识别和研究NetFlow模式中有价值的特征相关的研究空白，这些特征能够在现实世界中实现基于机器学习的有效网络入侵检测。为实现这一目标，在五个基于流的网络入侵检测数据集上应用了多种特征选择技术，建立了一个信息丰富的基于流的特征集。作者在部署这类系统方面的经验表明，为了弥合研究与市场之间的差距，并将基于机器学习的入侵检测应用于实际的现实世界，必须从终端用户那里收集一组标记数据。这项研究旨在确定足以有效训练入侵检测中机器学习算法的适量且合适的数据。结果表明，一组10个特征和少量数据就足以使最终模型表现出色。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

如何有效地收集和处理用于入侵检测的网络数据？

How to Effectively Collect and Process Network Data for Intrusion Detection?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

如何有效地收集和处理用于入侵检测的网络数据？

How to Effectively Collect and Process Network Data for Intrusion Detection?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献