Qi Guo-Jun, Hua Xian-Sheng, Rui Yong, Tang Jinhui, Zhang Hong-Jiang
Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign, Urbana, IL 61801-2918, USA.
IEEE Trans Pattern Anal Mach Intell. 2009 Oct;31(10):1880-97. doi: 10.1109/TPAMI.2008.218.
Conventional active learning dynamically constructs the training set only along the sample dimension. While this is the right strategy in binary classification, it is suboptimal for multilabel image classification. We argue that for each selected sample, only some effective labels need to be annotated while others can be inferred by exploring the label correlations. The reason is that the contributions of different labels to minimizing the classification error are different due to the inherent label correlations. To this end, we propose to select sample-label pairs, rather than only samples, to minimize a multilabel Bayesian classification error bound. We call it two-dimensional active learning because it considers both the sample dimension and the label dimension. Furthermore, as the number of training samples increases rapidly over time due to active learning, it becomes intractable for the offline learner to retrain a new model on the whole training set. So we develop an efficient online learner to adapt the existing model with the new one by minimizing their model distance under a set of multilabel constraints. The effectiveness and efficiency of the proposed method are evaluated on two benchmark data sets and a realistic image collection from a real-world image sharing Web site-Corbis.
传统的主动学习仅沿样本维度动态构建训练集。虽然这在二分类中是正确的策略,但对于多标签图像分类而言并非最优。我们认为,对于每个选定的样本,只需标注一些有效的标签,而其他标签可通过探索标签相关性来推断。原因在于,由于固有的标签相关性,不同标签对最小化分类误差的贡献不同。为此,我们提议选择样本 - 标签对,而非仅选择样本,以最小化多标签贝叶斯分类误差界。我们将其称为二维主动学习,因为它同时考虑了样本维度和标签维度。此外,由于主动学习使得训练样本数量随时间迅速增加,离线学习者在整个训练集上重新训练新模型变得难以处理。因此,我们开发了一种高效的在线学习者,通过在一组多标签约束下最小化现有模型与新模型之间的模型距离,使现有模型适应新模型。我们在两个基准数据集以及从真实世界图像共享网站Corbis获取的真实图像集中评估了所提方法的有效性和效率。