• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

如何有效地收集和处理用于入侵检测的网络数据?

How to Effectively Collect and Process Network Data for Intrusion Detection?

作者信息

Komisarek Mikołaj, Pawlicki Marek, Kozik Rafał, Hołubowicz Witold, Choraś Michał

机构信息

ITTI Sp. z o.o., Rubież 46, 61-612 Poznań, Poland.

Institute of Telecommunications and Computer Science, Bydgoszcz University of Science and Technology, 85-796 Bydgoszcz, Poland.

出版信息

Entropy (Basel). 2021 Nov 18;23(11):1532. doi: 10.3390/e23111532.

DOI:10.3390/e23111532
PMID:34828230
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8619486/
Abstract

The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-learning-based network intrusion detection is a promising direction for intrusion detection systems. However, many contemporary benchmark datasets do not contain features that are usable in the wild. The main contribution of this work is to cover the research gap related to identifying and investigating valuable features in the NetFlow schema that allow for effective, machine-learning-based network intrusion detection in the real world. To achieve this goal, several feature selection techniques have been applied on five flow-based network intrusion detection datasets, establishing an informative flow-based feature set. The authors' experience with the deployment of this kind of system shows that to close the research-to-market gap, and to perform actual real-world application of machine-learning-based intrusion detection, a set of labeled data from the end-user has to be collected. This research aims at establishing the appropriate, minimal amount of data that is sufficient to effectively train machine learning algorithms in intrusion detection. The results show that a set of 10 features and a small amount of data is enough for the final model to perform very well.

摘要

网络空间中安全漏洞的数量正在上升。这种威胁促使入侵检测研究领域展开了密集工作。为了使防御机制与时俱进且切实有效,需要现实的网络流量数据集。将基于流的数据用于基于机器学习的网络入侵检测,对入侵检测系统来说是一个很有前景的方向。然而,许多当代基准数据集并不包含在实际环境中可用的特征。这项工作的主要贡献在于填补了与识别和研究NetFlow模式中有价值的特征相关的研究空白,这些特征能够在现实世界中实现基于机器学习的有效网络入侵检测。为实现这一目标,在五个基于流的网络入侵检测数据集上应用了多种特征选择技术,建立了一个信息丰富的基于流的特征集。作者在部署这类系统方面的经验表明,为了弥合研究与市场之间的差距,并将基于机器学习的入侵检测应用于实际的现实世界,必须从终端用户那里收集一组标记数据。这项研究旨在确定足以有效训练入侵检测中机器学习算法的适量且合适的数据。结果表明,一组10个特征和少量数据就足以使最终模型表现出色。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/910e02457d54/entropy-23-01532-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/47c517fd004f/entropy-23-01532-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/cc705cc69b23/entropy-23-01532-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/c34529e68946/entropy-23-01532-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/905780f28a86/entropy-23-01532-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/9ab53e40b52a/entropy-23-01532-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/398d1abcea01/entropy-23-01532-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/d399add5561d/entropy-23-01532-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/e442eed2faf4/entropy-23-01532-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/523867296dd6/entropy-23-01532-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/2028fa639777/entropy-23-01532-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/910e02457d54/entropy-23-01532-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/47c517fd004f/entropy-23-01532-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/cc705cc69b23/entropy-23-01532-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/c34529e68946/entropy-23-01532-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/905780f28a86/entropy-23-01532-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/9ab53e40b52a/entropy-23-01532-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/398d1abcea01/entropy-23-01532-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/d399add5561d/entropy-23-01532-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/e442eed2faf4/entropy-23-01532-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/523867296dd6/entropy-23-01532-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/2028fa639777/entropy-23-01532-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c068/8619486/910e02457d54/entropy-23-01532-g011.jpg

相似文献

1
How to Effectively Collect and Process Network Data for Intrusion Detection?如何有效地收集和处理用于入侵检测的网络数据?
Entropy (Basel). 2021 Nov 18;23(11):1532. doi: 10.3390/e23111532.
2
Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection.基于流量的入侵检测的机器学习技术评估。
Sensors (Basel). 2022 Nov 30;22(23):9326. doi: 10.3390/s22239326.
3
Application of deep autoencoder as an one-class classifier for unsupervised network intrusion detection: a comparative evaluation.深度自动编码器作为无监督网络入侵检测的单类分类器的应用:一项比较评估。
PeerJ Comput Sci. 2020 Dec 7;6:e327. doi: 10.7717/peerj-cs.327. eCollection 2020.
4
An investigation and comparison of machine learning approaches for intrusion detection in IoMT network.物联网医疗网络中入侵检测的机器学习方法研究与比较
J Supercomput. 2022;78(15):17403-17422. doi: 10.1007/s11227-022-04568-3. Epub 2022 May 18.
5
Towards an Explainable Universal Feature Set for IoT Intrusion Detection.面向物联网入侵检测的可解释通用特征集。
Sensors (Basel). 2022 Jul 29;22(15):5690. doi: 10.3390/s22155690.
6
Examining the Suitability of NetFlow Features in Detecting IoT Network Intrusions.检测 NetFlow 特征在检测物联网网络入侵中的适用性。
Sensors (Basel). 2022 Aug 17;22(16):6164. doi: 10.3390/s22166164.
7
An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets.基于预处理特征化的物联网聚焦型入侵检测系统方法在网络安全数据集上的应用。
Sensors (Basel). 2021 Jan 19;21(2):656. doi: 10.3390/s21020656.
8
A Hybrid Framework for Intrusion Detection in Healthcare Systems Using Deep Learning.基于深度学习的医疗系统入侵检测混合框架。
Front Public Health. 2022 Jan 12;9:824898. doi: 10.3389/fpubh.2021.824898. eCollection 2021.
9
Comprehensive analysis and recommendation of feature evaluation measures for intrusion detection.入侵检测特征评估措施的综合分析与建议
Heliyon. 2020 Jul 9;6(7):e04262. doi: 10.1016/j.heliyon.2020.e04262. eCollection 2020 Jul.
10
Development of a Machine-Learning Intrusion Detection System and Testing of Its Performance Using a Generative Adversarial Network.开发机器学习入侵检测系统并使用生成对抗网络测试其性能。
Sensors (Basel). 2023 Jan 24;23(3):1315. doi: 10.3390/s23031315.

引用本文的文献

1
An IoT intrusion detection framework based on feature selection and large language models fine-tuning.一种基于特征选择和大语言模型微调的物联网入侵检测框架。
Sci Rep. 2025 Jul 1;15(1):21158. doi: 10.1038/s41598-025-08905-3.
2
Advances in Computer Recognition, Image Processing and Communications.计算机识别、图像处理与通信的进展
Entropy (Basel). 2022 Jan 10;24(1):108. doi: 10.3390/e24010108.

本文引用的文献

1
Analysis of Machine Learning Algorithms for Anomaly Detection on Edge Devices.边缘设备异常检测的机器学习算法分析。
Sensors (Basel). 2021 Jul 20;21(14):4946. doi: 10.3390/s21144946.
2
The Proposition and Evaluation of the RoEduNet-SIMARGL2021 Network Intrusion Detection Dataset.RoEduNet-SIMARGL2021 网络入侵检测数据集的提出与评估。
Sensors (Basel). 2021 Jun 24;21(13):4319. doi: 10.3390/s21134319.
3
Application of random forest based approaches to surface-enhanced Raman scattering data.基于随机森林方法在表面增强拉曼散射数据中的应用。
Sci Rep. 2020 Mar 25;10(1):5436. doi: 10.1038/s41598-020-62338-8.
4
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
5
From development to deployment: dataset shift, causality, and shift-stable models in health AI.从开发到部署:健康人工智能中的数据集偏移、因果关系和偏移稳定模型。
Biostatistics. 2020 Apr 1;21(2):345-352. doi: 10.1093/biostatistics/kxz041.
6
Unbiased feature selection in learning random forests for high-dimensional data.高维数据随机森林学习中的无偏特征选择
ScientificWorldJournal. 2015;2015:471371. doi: 10.1155/2015/471371. Epub 2015 Mar 24.
7
Bias in random forest variable importance measures: illustrations, sources and a solution.随机森林变量重要性度量中的偏差:示例、来源及解决方案
BMC Bioinformatics. 2007 Jan 25;8:25. doi: 10.1186/1471-2105-8-25.