Roda Hezi, Geva Amir B
Electrical and Computer Engineering, Ben-Gurion University, Be'er Sheva, Israel.
InnerEye Ltd CTO, Herzliya, Israel.
Front Artif Intell. 2024 May 30;7:1398844. doi: 10.3389/frai.2024.1398844. eCollection 2024.
Active learning is a field of machine learning that seeks to find the most efficient labels to annotate with a given budget, particularly in cases where obtaining labeled data is expensive or infeasible. This is becoming increasingly important with the growing success of learning-based methods, which often require large amounts of labeled data. Computer vision is one area where active learning has shown promise in tasks such as image classification, semantic segmentation, and object detection. In this research, we propose a pool-based semi-supervised active learning method for image classification that takes advantage of both labeled and unlabeled data. Many active learning approaches do not utilize unlabeled data, but we believe that incorporating these data can improve performance. To address this issue, our method involves several steps. First, we cluster the latent space of a pre-trained convolutional autoencoder. Then, we use a proposed clustering contrastive loss to strengthen the latent space's clustering while using a small amount of labeled data. Finally, we query the samples with the highest uncertainty to annotate with an oracle. We repeat this process until the end of the given budget. Our method is effective when the number of annotated samples is small, and we have validated its effectiveness through experiments on benchmark datasets. Our empirical results demonstrate the power of our method for image classification tasks in accuracy terms.
主动学习是机器学习的一个领域,旨在在给定预算下找到最有效的标签进行标注,特别是在获取标注数据成本高昂或不可行的情况下。随着基于学习的方法越来越成功,而这些方法通常需要大量标注数据,这一点变得越来越重要。计算机视觉是主动学习在图像分类、语义分割和目标检测等任务中显示出前景的一个领域。在本研究中,我们提出了一种基于池的半监督主动学习方法用于图像分类,该方法利用了标注数据和未标注数据。许多主动学习方法没有利用未标注数据,但我们认为纳入这些数据可以提高性能。为了解决这个问题,我们的方法包括几个步骤。首先,我们对预训练的卷积自动编码器的潜在空间进行聚类。然后,我们使用提出的聚类对比损失来加强潜在空间的聚类,同时使用少量标注数据。最后,我们查询具有最高不确定性的样本,由神谕进行标注。我们重复这个过程,直到给定预算结束。当标注样本数量较少时,我们的方法是有效的,并且我们通过在基准数据集上的实验验证了其有效性。我们的实证结果在准确性方面证明了我们的方法在图像分类任务中的强大能力。