IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1575-87. doi: 10.1109/TNNLS.2013.2261613.
Spectral embedding methods have played a very important role in dimensionality reduction and feature generation in machine learning. Supervised spectral embedding methods additionally improve the classification of labeled data, using proximity information that considers both features and class labels. However, these calculate the proximity information by treating all intraclass similarities homogeneously for all classes, and similarly for all interclass samples. In this paper, we propose a very novel and generic method which can treat all the intra- and interclass sample similarities heterogeneously by potentially using a different proximity function for each class and each class pair. To handle the complexity of selecting these functions, we employ evolutionary programming as an automated powerful formula induction engine. In addition, for computational efficiency and expressive power, we use a compact matrix tree representation equipped with a broad set of functions that can build most currently used similarity functions as well as new ones. Model selection is data driven, because the entire model is symbolically instantiated using only problem training data, and no user-selected functions or parameters are required. We perform thorough comparative experimentations with multiple classification datasets and many existing state-of-the-art embedding methods, which show that the proposed algorithm is very competitive in terms of classification accuracy and generalization ability.
谱嵌入方法在机器学习的降维和特征生成中发挥了非常重要的作用。监督谱嵌入方法通过利用同时考虑特征和类别标签的接近信息来改进有标签数据的分类。然而,这些方法通过对所有类别中的所有类内相似性以及所有类别间样本的相似性进行均匀处理来计算接近信息。在本文中,我们提出了一种非常新颖和通用的方法,该方法可以通过为每个类别和每个类别对潜在地使用不同的接近函数来不均匀地处理所有的类内和类间样本相似性。为了处理选择这些函数的复杂性,我们采用进化编程作为自动化的强大公式归纳引擎。此外,为了提高计算效率和表达能力,我们使用了一种紧凑的矩阵树表示,该表示配备了广泛的函数集,可构建当前大多数使用的相似性函数以及新的函数。模型选择是数据驱动的,因为整个模型仅使用问题训练数据进行符号实例化,而不需要用户选择的函数或参数。我们使用多个分类数据集和许多现有的最先进的嵌入方法进行了彻底的比较实验,结果表明,所提出的算法在分类准确性和泛化能力方面具有很强的竞争力。