Du Xingzhong, Nie Feiping, Wang Weiqing, Yang Yi, Zhou Xiaofang
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):201-214. doi: 10.1109/TNNLS.2018.2837100. Epub 2018 Jun 7.
In learning applications, exploring the cluster structures of the high dimensional data is an important task. It requires projecting or visualizing the cluster structures into a low dimensional space. The challenges are: 1) how to perform the projection or visualization with less information loss and 2) how to preserve the interpretability of the original data. Recent methods address these challenges simultaneously by unsupervised feature selection. They learn the cluster indicators based on the k nearest neighbor similarity graph, then select the features highly correlated with these indicators. Under this direction, many techniques, such as local discriminative analysis, nonnegative spectral analysis, nonnegative matrix factorization, etc., have been successfully introduced to make the selection more accurate. In this paper, we focus on enhancing the unsupervised feature selection in another perspective, namely, making the selection exploit the combination effect of the features. Given the expected feature amount, previous works operate on the whole features then select those of high coefficients one by one as the output. Our proposed method, instead, operates on a group of features initially then update the selection when a better group appears. Compared to the previous methods, the proposed method exploits the combination effect of the features by l norm. It improves the selection accuracy where the cluster structures are strongly related to a group of features. We conduct the experiments on six open access data sets from different domains. The experimental results show that our proposed method is more accurate than the recent methods which do not specially consider the combination effect of the features.
在学习应用中,探索高维数据的聚类结构是一项重要任务。这需要将聚类结构投影或可视化到低维空间。面临的挑战有:1)如何在信息损失较少的情况下进行投影或可视化,以及2)如何保留原始数据的可解释性。近期的方法通过无监督特征选择同时解决了这些挑战。它们基于k近邻相似性图学习聚类指标,然后选择与这些指标高度相关的特征。在此方向上,许多技术,如局部判别分析、非负谱分析、非负矩阵分解等,已被成功引入以提高选择的准确性。在本文中,我们从另一个角度专注于增强无监督特征选择,即让选择利用特征的组合效应。给定预期的特征数量,先前的工作对所有特征进行操作,然后逐个选择系数高的特征作为输出。相反,我们提出的方法首先对一组特征进行操作,然后当出现更好的组时更新选择。与先前的方法相比,所提出的方法通过l范数利用了特征的组合效应。在聚类结构与一组特征密切相关的情况下,它提高了选择的准确性。我们在来自不同领域的六个开放获取数据集上进行了实验。实验结果表明,我们提出的方法比近期未特别考虑特征组合效应的方法更准确。