School of Information, Zhejiang Sci-Tech University, Hangzhou, China.
School of Computer Science and Engineering, Central South University, Changsha, China.
Comput Intell Neurosci. 2021 Jul 20;2021:6911192. doi: 10.1155/2021/6911192. eCollection 2021.
Feature selection is a known technique to preprocess the data before performing any data mining task. In multivariate time series (MTS) prediction, feature selection needs to find both the most related variables and their corresponding delays. Both aspects, to a certain extent, represent essential characteristics of system dynamics. However, the variable and delay selection for MTS is a challenging task when the system is nonlinear and noisy. In this paper, a multiattention-based supervised feature selection method is proposed. It translates the feature weight generation problem into a bidirectional attention generation problem with two parallel placed attention modules. The input 2D data are sliced into 1D data from two orthogonal directions, and each attention module generates attention weights from their respective dimensions. To facilitate the feature selection from the global perspective, we proposed a global weight generation method that calculates a dot product operation on the weight values of the two dimensions. To avoid the disturbance of attention weights due to noise and duplicated features, the final feature weight matrix is calculated based on the statistics of the entire training set. Experimental results show that this proposed method achieves the best performance on compared synthesized, small, medium, and practical industrial datasets, compared to several state-of-the-art baseline feature selection methods.
特征选择是在执行任何数据挖掘任务之前预处理数据的一种已知技术。在多元时间序列 (MTS) 预测中,特征选择需要同时找到最相关的变量及其相应的延迟。这两个方面在某种程度上都代表了系统动态的基本特征。然而,当系统是非线性和嘈杂时,MTS 的变量和延迟选择是一项具有挑战性的任务。在本文中,提出了一种基于多注意力的监督特征选择方法。它将特征权重生成问题转化为具有两个平行放置的注意力模块的双向注意力生成问题。输入的 2D 数据从两个正交方向被切片成 1D 数据,每个注意力模块从各自的维度生成注意力权重。为了便于从全局角度进行特征选择,我们提出了一种全局权重生成方法,该方法在两个维度的权重值上进行点积运算。为了避免由于噪声和重复特征而导致注意力权重的干扰,最终的特征权重矩阵是基于整个训练集的统计数据计算的。实验结果表明,与几种最先进的基线特征选择方法相比,该方法在比较综合的、小的、中等的和实际工业数据集上取得了最佳性能。