IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1668-1681. doi: 10.1109/TNNLS.2016.2542184. Epub 2016 Apr 18.
While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.
虽然主动学习 (AL) 在分类问题上得到了广泛的研究,但在回归问题上的研究却很少。在本文中,我们引入了一种新的回归主动学习框架,即期望模型变化最大化 (EMCM),该框架旨在选择标注后会导致当前模型发生最大变化的未标注数据实例。模型变化的量化方式是当前模型参数与包含新选择示例后更新的参数之间的差异。根据随机梯度下降学习规则,我们将变化近似为损失函数相对于每个候选实例的梯度。在 EMCM 框架下,我们为线性和非线性回归模型提出了新的主动学习算法。此外,通过模拟顺序主动学习策略在应用于 k 次迭代时的行为,我们进一步将算法扩展到批量主动学习,以便在每次查询时同时选择一组 k 个最有信息量的实例。在 UCI 和 StatLib 基准数据集上的广泛实验结果表明,所提出的算法是高效和有效的。