Zhang Zongfu, Luo Qingjia, Ying Zuobin, Chen Rongbin, Chen Hongan
College of Information Engineering, Jiangmen Polytechnic, Jiangmen, China.
Faculty of Data Science, City University of Macau, Macau, China.
PeerJ Comput Sci. 2023 Jun 26;9:e1447. doi: 10.7717/peerj-cs.1447. eCollection 2023.
High dimension and complexity of network high-dimensional data lead to poor feature selection effect network high-dimensional data. To effectively solve this problem, feature selection algorithms for high-dimensional network data based on supervised discriminant projection (SDP) have been designed. The sparse representation problem of high-dimensional network data is transformed into an Lp norm optimization problem, and the sparse subspace clustering method is used to cluster high-dimensional network data. Dimensionless processing is carried out for the clustering processing results. Based on the linear projection matrix and the best transformation matrix, the dimensionless processing results are reduced by combining the SDP. The sparse constraint method is used to achieve feature selection of high-dimensional data in the network, and the relevant feature selection results are obtained. The experimental findings demonstrate that the suggested algorithm can effectively cluster seven different types of data and converges when the number of iterations approaches 24. The F1 value, recall, and precision are all kept at high levels. High-dimensional network data feature selection accuracy on average is 96.9%, and feature selection time on average is 65.1 milliseconds. The selection effect for network high-dimensional data features is good.
网络高维数据的高维度和复杂性导致网络高维数据的特征选择效果不佳。为有效解决这一问题,设计了基于监督判别投影(SDP)的高维网络数据特征选择算法。将高维网络数据的稀疏表示问题转化为Lp范数优化问题,并采用稀疏子空间聚类方法对高维网络数据进行聚类。对聚类处理结果进行无量纲处理。基于线性投影矩阵和最佳变换矩阵,结合SDP对无量纲处理结果进行降维。采用稀疏约束方法实现网络中高维数据的特征选择,并得到相关特征选择结果。实验结果表明,所提算法能够有效聚类七种不同类型的数据,且在迭代次数接近24时收敛。F1值、召回率和精确率均保持在较高水平。网络高维数据特征选择平均准确率为96.9%,特征选择平均时间为65.1毫秒。网络高维数据特征的选择效果良好。