IEEE Trans Cybern. 2016 Feb;46(2):462-73. doi: 10.1109/TCYB.2015.2403573. Epub 2015 Feb 27.
Universum, a collection of nonexamples that do not belong to any class of interest, has become a new research topic in machine learning. This paper devises a semi-supervised learning with Universum algorithm based on boosting technique, and focuses on situations where only a few labeled examples are available. We also show that the training error of AdaBoost with Universum is bounded by the product of normalization factor, and the training error drops exponentially fast when each weak classifier is slightly better than random guessing. Finally, the experiments use four data sets with several combinations. Experimental results indicate that the proposed algorithm can benefit from Universum examples and outperform several alternative methods, particularly when insufficient labeled examples are available. When the number of labeled examples is insufficient to estimate the parameters of classification functions, the Universum can be used to approximate the prior distribution of the classification functions. The experimental results can be explained using the concept of Universum introduced by Vapnik, that is, Universum examples implicitly specify a prior distribution on the set of classification functions.
非例全集(Universum)作为不属于任何特定感兴趣类别的样本集合,已成为机器学习领域的一个新的研究课题。本文基于提升技术设计了一种基于 Universum 的半监督学习算法,并重点关注仅有少量标记示例的情况。我们还表明,具有 Universum 的 AdaBoost 的训练误差受规范化因子的乘积限制,并且当每个弱分类器略优于随机猜测时,训练误差会呈指数级快速下降。最后,实验使用了四个具有多种组合的数据集。实验结果表明,所提出的算法可以从 Universum 示例中受益,并优于几种替代方法,特别是在可用的标记示例不足的情况下。当标记示例的数量不足以估计分类函数的参数时,可以使用 Universum 来近似分类函数的先验分布。实验结果可以用 Vapnik 提出的 Universum 概念来解释,即 Universum 示例隐含地指定了分类函数集上的先验分布。