Wu Jian, Sheng Victor S, Zhang Jing, Li Hua, Dadakova Tetiana, Swisher Christine Leon, Cui Zhiming, Zhao Pengpeng
Soochow University, China and Human Longevity, Inc., USA.
Texas Tech University, USA.
ACM Comput Surv. 2020 Jun;53(2). doi: 10.1145/3379504. Epub 2020 Mar 13.
Image classification is a key task in image understanding, and multi-label image classification has become a popular topic in recent years. However, the success of multi-label image classification is closely related to the way of constructing a training set. As active learning aims to construct an effective training set through iteratively selecting the most informative examples to query labels from annotators, it was introduced into multi-label image classification. Accordingly, multi-label active learning is becoming an important research direction. In this work, we first review existing multi-label active learning algorithms for image classification. These algorithms can be categorized into two top groups from two aspects respectively: sampling and annotation. The most important component of multi-label active learning is to design an effective sampling strategy that actively selects the examples with the highest informativeness from an unlabeled data pool, according to various information measures. Thus, different informativeness measures are emphasized in this survey. Furthermore, this work also makes a deep investigation on existing challenging issues and future promises in multi-label active learning with a focus on four core aspects: example dimension, label dimension, annotation, and application extension.
图像分类是图像理解中的一项关键任务,多标签图像分类近年来已成为一个热门话题。然而,多标签图像分类的成功与训练集的构建方式密切相关。由于主动学习旨在通过迭代选择最具信息量的示例向标注者查询标签来构建有效的训练集,因此它被引入到多标签图像分类中。相应地,多标签主动学习正成为一个重要的研究方向。在这项工作中,我们首先回顾了现有的用于图像分类的多标签主动学习算法。这些算法可以分别从两个方面分为两大类:采样和标注。多标签主动学习最重要的组成部分是设计一种有效的采样策略,根据各种信息度量从未标记的数据池中主动选择信息量最高的示例。因此,本综述强调了不同的信息量度量。此外,这项工作还对多标签主动学习中现有的具有挑战性的问题和未来前景进行了深入研究,重点关注四个核心方面:示例维度、标签维度、标注和应用扩展。