Schölkopf B, Platt J C, Shawe-Taylor J, Smola A J, Williamson R C
Microsoft Research Ltd, Cambridge CB2 3NH, U.K.
Neural Comput. 2001 Jul;13(7):1443-71. doi: 10.1162/089976601750264965.
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
假设你有一些从潜在概率分布P中抽取的数据集,并且你想要估计输入空间的一个“简单”子集S,使得从P中抽取的测试点落在S之外的概率等于0到1之间某个预先指定的值。我们提出一种方法来解决这个问题,即尝试估计一个函数f,它在S上为正,在其补集上为负。f的函数形式由基于训练数据的一个潜在小子集的核展开给出;通过控制相关特征空间中权重向量的长度来对其进行正则化。通过求解一个二次规划问题来找到展开系数,我们通过对输入模式对进行顺序优化来做到这一点。我们还对我们算法的统计性能进行了理论分析。该算法是支持向量算法到无标签数据情况的自然扩展。