Wang Gang, Wang Fei, Chen Tao, Yeung Dit-Yan, Lochovsky Frederick H
Tencent Inc., Beijing 100080, China.
IEEE Trans Syst Man Cybern B Cybern. 2012 Apr;42(2):308-19. doi: 10.1109/TSMCB.2011.2168205. Epub 2011 Oct 14.
Traditional learning algorithms use only labeled data for training. However, labeled examples are often difficult or time consuming to obtain since they require substantial human labeling efforts. On the other hand, unlabeled data are often relatively easy to collect. Semisupervised learning addresses this problem by using large quantities of unlabeled data with labeled data to build better learning algorithms. In this paper, we use the manifold regularization approach to formulate the semisupervised learning problem where a regularization framework which balances a tradeoff between loss and penalty is established. We investigate different implementations of the loss function and identify the methods which have the least computational expense. The regularization hyperparameter, which determines the balance between loss and penalty, is crucial to model selection. Accordingly, we derive an algorithm that can fit the entire path of solutions for every value of the hyperparameter. Its computational complexity after preprocessing is quadratic only in the number of labeled examples rather than the total number of labeled and unlabeled examples.
传统的学习算法仅使用有标签的数据进行训练。然而,有标签的示例往往很难获得或获取过程耗时,因为它们需要大量的人工标注工作。另一方面,无标签的数据通常相对容易收集。半监督学习通过使用大量无标签数据和有标签数据来构建更好的学习算法,从而解决了这个问题。在本文中,我们使用流形正则化方法来公式化半监督学习问题,其中建立了一个在损失和惩罚之间进行权衡的正则化框架。我们研究了损失函数的不同实现方式,并确定了计算成本最低的方法。决定损失和惩罚之间平衡的正则化超参数对于模型选择至关重要。因此,我们推导了一种算法,该算法可以针对超参数的每个值拟合整个解路径。预处理后的计算复杂度仅与有标签示例的数量成二次关系,而不是与有标签和无标签示例的总数成二次关系。