Gold Carl, Holub Alex, Sollich Peter
Computation and Neural Systems, California Institute of Technology, 139-74, Pasadena, CA 91125, USA.
Neural Netw. 2005 Jun-Jul;18(5-6):693-701. doi: 10.1016/j.neunet.2005.06.044.
A Bayesian point of view of SVM classifiers allows the definition of a quantity analogous to the evidence in probabilistic models. By maximizing this one can systematically tune hyperparameters and, via automatic relevance determination (ARD), select relevant input features. Evidence gradients are expressed as averages over the associated posterior and can be approximated using Hybrid Monte Carlo (HMC) sampling. We describe how a Nyström approximation of the Gram matrix can be used to speed up sampling times significantly while maintaining almost unchanged classification accuracy. In experiments on classification problems with a significant number of irrelevant features this approach to ARD can give a significant improvement in classification performance over more traditional, non-ARD, SVM systems. The final tuned hyperparameter values provide a useful criterion for pruning irrelevant features, and we define a measure of relevance with which to determine systematically how many features should be removed. This use of ARD for hard feature selection can improve classification accuracy in non-ARD SVMs. In the majority of cases, however, we find that in data sets constructed by human domain experts the performance of non-ARD SVMs is largely insensitive to the presence of some less relevant features. Eliminating such features via ARD then does not improve classification accuracy, but leads to impressive reductions in the number of features required, by up to 75%.
支持向量机(SVM)分类器的贝叶斯观点允许定义一个类似于概率模型中证据的量。通过最大化这个量,可以系统地调整超参数,并通过自动相关性确定(ARD)选择相关的输入特征。证据梯度表示为相关后验的平均值,可以使用混合蒙特卡罗(HMC)采样进行近似。我们描述了如何使用Gram矩阵的Nyström近似来显著加快采样时间,同时保持分类精度几乎不变。在具有大量无关特征的分类问题实验中,这种ARD方法相对于更传统的非ARD支持向量机系统,可以显著提高分类性能。最终调整后的超参数值为修剪无关特征提供了一个有用的标准,并且我们定义了一种相关性度量,用于系统地确定应该删除多少特征。这种将ARD用于硬特征选择的方法可以提高非ARD支持向量机的分类精度。然而,在大多数情况下,我们发现在由人类领域专家构建的数据集中,非ARD支持向量机的性能在很大程度上对一些不太相关的特征的存在不敏感。通过ARD消除这些特征并不会提高分类精度,但会使所需特征数量大幅减少,最多可减少75%。