IEEE Trans Neural Netw Learn Syst. 2013 Dec;24(12):1999-2012. doi: 10.1109/TNNLS.2013.2271327.
Graph-based approaches have been most successful in semisupervised learning. In this paper, we focus on label propagation in graph-based semisupervised learning. One essential point of label propagation is that the performance is heavily affected by incorporating underlying manifold of given data into the input graph. The other more important point is that in many recent real-world applications, the same instances are represented by multiple heterogeneous data sources. A key challenge under this setting is to integrate different data representations automatically to achieve better predictive performance. In this paper, we address the issue of obtaining the optimal linear combination of multiple different graphs under the label propagation setting. For this problem, we propose a new formulation with the sparsity (in coefficients of graph combination) property which cannot be rightly achieved by any other existing methods. This unique feature provides two important advantages: 1) the improvement of prediction performance by eliminating irrelevant or noisy graphs and 2) the interpretability of results, i.e., easily identifying informative graphs on classification. We propose efficient optimization algorithms for the proposed approach, by which clear interpretations of the mechanism for sparsity is provided. Through various synthetic and two real-world data sets, we empirically demonstrate the advantages of our proposed approach not only in prediction performance but also in graph selection ability.
基于图的方法在半监督学习中最为成功。在本文中,我们专注于基于图的半监督学习中的标签传播。标签传播的一个重要要点是,性能受到将给定数据的底层流形纳入输入图的影响很大。另一个更重要的要点是,在许多最近的实际应用中,相同的实例由多个异构数据源表示。在此设置下,一个关键挑战是自动集成不同的数据表示形式以实现更好的预测性能。在本文中,我们解决了在标签传播设置下获得多个不同图的最优线性组合的问题。对于这个问题,我们提出了一种具有稀疏性(在图组合系数中)的新公式,这是任何其他现有方法都无法正确实现的。这个独特的特性提供了两个重要的优势:1)通过消除不相关或嘈杂的图来提高预测性能,2)结果的可解释性,即,轻松识别分类中的信息丰富的图。我们为所提出的方法提出了有效的优化算法,通过该算法提供了对稀疏性机制的清晰解释。通过各种合成和两个真实数据集,我们从经验上证明了我们提出的方法不仅在预测性能方面,而且在图选择能力方面的优势。