School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA.
IEEE Trans Pattern Anal Mach Intell. 2010 May;32(5):788-98. doi: 10.1109/TPAMI.2009.98.
In many signal processing applications, grouping of features during model development and the selection of a small number of relevant groups can be useful to improve the interpretability of the learned parameters. While a lot of work based on linear models has been reported to solve this problem, in the last few years, multiple kernel learning has come up as a candidate to solve this problem in nonlinear models. Since all of the multiple kernel learning algorithms to date use convex primal problem formulations, the kernel weights selected by these algorithms are not strictly the sparsest possible solution. The main reason for using a convex primal formulation is that efficient implementations of kernel-based methods invariably rely on solving the dual problem. This work proposes the use of an additional log-based concave penalty term in the primal problem to induce sparsity in terms of groups of parameters. A generalized iterative learning algorithm, which can be used with a linear combination of this concave penalty term with other penalty terms, is given for model parameter estimation in the primal space. It is then shown that a natural extension of the method to nonlinear models using the "kernel trick" results in a new algorithm, called Sparse Multiple Kernel Learning (SMKL), which generalizes group-feature selection to kernel selection. SMKL is capable of exploiting existing efficient single kernel algorithms while providing a sparser solution in terms of the number of kernels used as compared to the existing multiple kernel learning framework. A number of signal processing examples based on the use of mass spectra for cancer detection, hyperspectral imagery for land cover classification, and NIR spectra from wheat, fescue grass, and diesel are given to highlight the ability of SMKL to achieve a very high accuracy with a very few kernels.
在许多信号处理应用中,在模型开发过程中对特征进行分组,以及选择少量相关的组,可以提高所学习参数的可解释性。虽然已经有很多基于线性模型的工作来解决这个问题,但在最近几年,多核学习已经成为解决非线性模型中这个问题的一个候选方案。由于迄今为止所有的多核学习算法都使用凸的原始问题公式,因此这些算法选择的核权重并不是严格意义上的最稀疏解。使用凸原始公式的主要原因是基于核的方法的有效实现总是依赖于求解对偶问题。这项工作提出了在原始问题中使用额外的基于对数的凹惩罚项,以根据参数组诱导稀疏性。给出了一种在原始空间中进行模型参数估计的广义迭代学习算法,该算法可以与这个凹惩罚项与其他惩罚项的线性组合一起使用。然后,证明了使用“核技巧”将该方法自然扩展到非线性模型中,会得到一种新的算法,称为稀疏多核学习(Sparse Multiple Kernel Learning,SMKL),它将分组特征选择推广到核选择。SMKL 能够利用现有的高效单核算法,同时与现有的多核学习框架相比,在使用的核数量方面提供更稀疏的解决方案。基于癌症检测的质谱、土地覆盖分类的高光谱图像以及小麦、羊茅草和柴油的近红外光谱的一些信号处理示例被给出,以突出 SMKL 实现高精度和少量核的能力。