College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing, China.
Fluid Drive and Car Equipment Technical Engineering Department, Beijing Research Institute of Automation for Machinery Industry Co., Ltd, 100120 Beijing, China.
Comput Intell Neurosci. 2022 Apr 28;2022:9248267. doi: 10.1155/2022/9248267. eCollection 2022.
The industrial control data set has many features and large redundancy, which has a certain impact on the training speed and classification results of the neural network anomaly detection algorithm. However, features are independent of each other, and dimension reduction often increases the false positive rate and false negative rate. The feature sequencing algorithm can reduce this effect. In order to select the appropriate feature sequencing algorithm for different data sets, this paper proposes an adaptive feature sequencing method based on data set evaluation index parameters. Firstly, the evaluation index system is constructed by the basic information of the data set, the mathematical characteristics of the data set, and the association degree of the data set. Then, the selection model is obtained by the decision tree training with the data label and the evaluation index, and the suitable feature sequencing algorithm is selected. Experiments were conducted on 11 data sets, including Batadal data set, CICIDS 2017, and Mississippi data set. The sequenced data sets are classified by ResNet. The accuracy of the sequenced data sets increases by 2.568% on average in 30 generations, and the average time reduction per epoch is 24.143%. Experiments show that this method can effectively select the feature sequencing algorithm with the best comprehensive performance.
工业控制数据集具有许多特征和大量冗余,这对神经网络异常检测算法的训练速度和分类结果有一定影响。然而,特征之间是相互独立的,降维往往会增加误报率和漏报率。特征排序算法可以降低这种影响。为了为不同的数据集选择合适的特征排序算法,本文提出了一种基于数据集评估指标参数的自适应特征排序方法。首先,通过数据集的基本信息、数据集的数学特征和数据集的关联度构建评估指标体系。然后,利用带有数据标签和评估指标的决策树进行训练,得到选择模型,选择合适的特征排序算法。在包括 Batadal 数据集、CICIDS 2017 和密西西比数据集在内的 11 个数据集上进行了实验,通过 ResNet 对排序后的数据进行分类,在 30 代中排序后数据的准确率平均提高了 2.568%,每个时期的平均时间减少了 24.143%。实验表明,该方法可以有效地选择综合性能最佳的特征排序算法。