基于多维评估参数的工业控制数据集特征排序方法。

Feature Sequencing Method of Industrial Control Data Set Based on Multidimensional Evaluation Parameters.

机构信息

College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing, China.

Fluid Drive and Car Equipment Technical Engineering Department, Beijing Research Institute of Automation for Machinery Industry Co., Ltd, 100120 Beijing, China.

出版信息

Comput Intell Neurosci. 2022 Apr 28;2022:9248267. doi: 10.1155/2022/9248267. eCollection 2022.

DOI:10.1155/2022/9248267

PMID:35528350

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9071983/

Abstract

The industrial control data set has many features and large redundancy, which has a certain impact on the training speed and classification results of the neural network anomaly detection algorithm. However, features are independent of each other, and dimension reduction often increases the false positive rate and false negative rate. The feature sequencing algorithm can reduce this effect. In order to select the appropriate feature sequencing algorithm for different data sets, this paper proposes an adaptive feature sequencing method based on data set evaluation index parameters. Firstly, the evaluation index system is constructed by the basic information of the data set, the mathematical characteristics of the data set, and the association degree of the data set. Then, the selection model is obtained by the decision tree training with the data label and the evaluation index, and the suitable feature sequencing algorithm is selected. Experiments were conducted on 11 data sets, including Batadal data set, CICIDS 2017, and Mississippi data set. The sequenced data sets are classified by ResNet. The accuracy of the sequenced data sets increases by 2.568% on average in 30 generations, and the average time reduction per epoch is 24.143%. Experiments show that this method can effectively select the feature sequencing algorithm with the best comprehensive performance.

摘要

工业控制数据集具有许多特征和大量冗余，这对神经网络异常检测算法的训练速度和分类结果有一定影响。然而，特征之间是相互独立的，降维往往会增加误报率和漏报率。特征排序算法可以降低这种影响。为了为不同的数据集选择合适的特征排序算法，本文提出了一种基于数据集评估指标参数的自适应特征排序方法。首先，通过数据集的基本信息、数据集的数学特征和数据集的关联度构建评估指标体系。然后，利用带有数据标签和评估指标的决策树进行训练，得到选择模型，选择合适的特征排序算法。在包括 Batadal 数据集、CICIDS 2017 和密西西比数据集在内的 11 个数据集上进行了实验，通过 ResNet 对排序后的数据进行分类，在 30 代中排序后数据的准确率平均提高了 2.568%，每个时期的平均时间减少了 24.143%。实验表明，该方法可以有效地选择综合性能最佳的特征排序算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd4f/9071983/e22a9ac8cef2/CIN2022-9248267.001.jpg

相似文献

Feature Sequencing Method of Industrial Control Data Set Based on Multidimensional Evaluation Parameters.

Comput Intell Neurosci. 2022 Apr 28;2022:9248267. doi: 10.1155/2022/9248267. eCollection 2022.

Multi-Label Feature Selection with Conditional Mutual Information.

Comput Intell Neurosci. 2022 Oct 8;2022:9243893. doi: 10.1155/2022/9243893. eCollection 2022.

Research on Anomaly Network Detection Based on Self-Attention Mechanism.

Sensors (Basel). 2023 May 25;23(11):5059. doi: 10.3390/s23115059.

A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information.

Comput Intell Neurosci. 2021 Dec 28;2021:3569632. doi: 10.1155/2021/3569632. eCollection 2021.

Feature Selection by Hybrid Brain Storm Optimization Algorithm for COVID-19 Classification.

J Comput Biol. 2022 Jun;29(6):515-529. doi: 10.1089/cmb.2021.0256. Epub 2022 Apr 19.

Feature Selection Combining Information Theory View and Algebraic View in the Neighborhood Decision System.

Entropy (Basel). 2021 Jun 2;23(6):704. doi: 10.3390/e23060704.

A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147.

Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma.

Comput Methods Programs Biomed. 2006 Aug;83(2):147-56. doi: 10.1016/j.cmpb.2006.06.007. Epub 2006 Aug 8.

Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm.

Math Biosci Eng. 2022 Sep 19;19(12):13747-13781. doi: 10.3934/mbe.2022641.

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique.

BMC Bioinformatics. 2020 May 26;21(1):216. doi: 10.1186/s12859-020-3471-4.

引用本文的文献

Retracted: Feature Sequencing Method of Industrial Control Data Set Based on Multidimensional Evaluation Parameters.

Comput Intell Neurosci. 2023 Jul 26;2023:9847359. doi: 10.1155/2023/9847359. eCollection 2023.

本文引用的文献

Predicting carbon and water vapor fluxes using machine learning and novel feature ranking algorithms.

Sci Total Environ. 2021 Jun 25;775:145130. doi: 10.1016/j.scitotenv.2021.145130. Epub 2021 Feb 9.

Biomarker discovery by feature ranking: Evaluation on a case study of embryonal tumors.

Comput Biol Med. 2021 Jan;128:104143. doi: 10.1016/j.compbiomed.2020.104143. Epub 2020 Nov 28.

Radiomics analysis using stability selection supervised component analysis for right-censored survival data.

Comput Biol Med. 2020 Sep;124:103959. doi: 10.1016/j.compbiomed.2020.103959. Epub 2020 Aug 6.

Evaluation of 0 ≤ ≤ 8 earthquake data sets in African - Asian region during 1966-2015.

Data Brief. 2018 Jan 31;17:588-603. doi: 10.1016/j.dib.2018.01.049. eCollection 2018 Apr.

Correlation Coefficients: Appropriate Use and Interpretation.

Anesth Analg. 2018 May;126(5):1763-1768. doi: 10.1213/ANE.0000000000002864.

Cyber security risk assessment for SCADA and DCS networks.

ISA Trans. 2007 Oct;46(4):583-94. doi: 10.1016/j.isatra.2007.04.003. Epub 2007 Jul 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于多维评估参数的工业控制数据集特征排序方法。

Feature Sequencing Method of Industrial Control Data Set Based on Multidimensional Evaluation Parameters.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献