Suppr超能文献

基于流形正则化的判别式半监督特征选择

Discriminative semi-supervised feature selection via manifold regularization.

作者信息

Xu Zenglin, King Irwin, Lyu Michael Rung-Tsong, Jin Rong

机构信息

Cluster of Excellence, Saarland University, Max Planck Institute for Informatics, Saarbruecken 66123, Germany.

出版信息

IEEE Trans Neural Netw. 2010 Jul;21(7):1033-47. doi: 10.1109/TNN.2010.2047114. Epub 2010 Jun 21.

Abstract

Feature selection has attracted a huge amount of interest in both research and application communities of data mining. We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount of unlabeled examples. Since a small number of labeled samples are usually insufficient for identifying the relevant features, the critical problem arising from semi-supervised feature selection is how to take advantage of the information underneath the unlabeled data. To address this problem, we propose a novel discriminative semi-supervised feature selection method based on the idea of manifold regularization. The proposed approach selects features through maximizing the classification margin between different classes and simultaneously exploiting the geometry of the probability distribution that generates both labeled and unlabeled data. In comparison with previous semi-supervised feature selection algorithms, our proposed semi-supervised feature selection method is an embedded feature selection method and is able to find more discriminative features. We formulate the proposed feature selection method into a convex-concave optimization problem, where the saddle point corresponds to the optimal solution. To find the optimal solution, the level method, a fairly recent optimization method, is employed. We also present a theoretic proof of the convergence rate for the application of the level method to our problem. Empirical evaluation on several benchmark data sets demonstrates the effectiveness of the proposed semi-supervised feature selection method.

摘要

特征选择在数据挖掘的研究和应用社区中都引起了极大的关注。我们考虑半监督特征选择问题,即给定少量的标记示例和大量的未标记示例。由于少量的标记样本通常不足以识别相关特征,半监督特征选择中出现的关键问题是如何利用未标记数据中的信息。为了解决这个问题,我们基于流形正则化的思想提出了一种新颖的判别式半监督特征选择方法。所提出的方法通过最大化不同类之间的分类间隔并同时利用生成标记和未标记数据的概率分布的几何结构来选择特征。与以前的半监督特征选择算法相比,我们提出的半监督特征选择方法是一种嵌入式特征选择方法,能够找到更具判别力的特征。我们将所提出的特征选择方法公式化为一个凹凸优化问题,其中鞍点对应于最优解。为了找到最优解,采用了一种相当新的优化方法——水平法。我们还给出了水平法应用于我们问题的收敛速度的理论证明。在几个基准数据集上的实证评估证明了所提出的半监督特征选择方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验