IEEE Trans Cybern. 2019 Dec;49(12):4460-4472. doi: 10.1109/TCYB.2018.2869861. Epub 2018 Sep 28.
Batch-mode active learning algorithms can select a batch of valuable unlabeled samples to manually annotate for reducing the total cost of labeling every unlabeled sample. To facilitate selection of valuable unlabeled samples, many batch-mode active learning algorithms map samples to the reproducing kernel Hilbert space induced by a radial-basis function (RBF) kernel. Setting a proper value to the parameter for the RBF kernel is crucial for such batch-mode active learning algorithms. In this paper, for automatic tuning of the kernel parameter, a hypothesis-margin-based criterion function is proposed. Three frameworks are also developed to incorporate the function of automatic tuning of the kernel parameter with existing batch-model active learning algorithms. In the proposed frameworks, the kernel parameter can be tuned in a single stage or in multiple stages. Tuning the kernel parameter in a single stage aims for the kernel parameter to be suitable for selecting the specified number of unlabeled samples. When the kernel parameter is tuned in multiple stages, the incorporated active learning algorithm can be enforced to make coarse-to-fine evaluations of the importance of unlabeled samples. The proposed framework can also improve the scalability of existing batch-mode active learning algorithms satisfying a decomposition property. Experimental results on data sets comprising hundreds to hundreds of thousands of samples have shown the feasibility of the proposed framework.
批处理式主动学习算法可以选择一批有价值的未标记样本进行手动标注,以减少对每个未标记样本的标注总成本。为了方便选择有价值的未标记样本,许多批处理式主动学习算法将样本映射到由径向基函数 (RBF) 核诱导的再生核希尔伯特空间。为这种批处理式主动学习算法设置 RBF 核的参数的适当值是至关重要的。在本文中,为了自动调整核参数,提出了一种基于假设边界的准则函数。还开发了三个框架,将自动调整核参数的功能与现有的批模型主动学习算法相结合。在所提出的框架中,核参数可以在单个阶段或多个阶段进行调整。在单个阶段调整核参数的目的是使核参数适合选择指定数量的未标记样本。当在多个阶段调整核参数时,所合并的主动学习算法可以强制对未标记样本的重要性进行由粗到细的评估。所提出的框架还可以提高满足分解特性的现有批处理式主动学习算法的可扩展性。在包含数百到数十万样本的数据集中的实验结果表明了所提出的框架的可行性。