Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan.
Meidensha Corporation, 2-1-1, Osaki, Shinagawa, Tokyo 141-0032, Japan.
Sensors (Basel). 2022 Jul 13;22(14):5244. doi: 10.3390/s22145244.
One way to improve annotation efficiency is active learning. The goal of active learning is to select images from many unlabeled images, where labeling will improve the accuracy of the machine learning model the most. To select the most informative unlabeled images, conventional methods use deep neural networks with a large number of computation nodes and long computation time, but we propose a non-deep neural network method that does not require any additional training for unlabeled image selection. The proposed method trains a task model on labeled images, and then the model predicts unlabeled images. Based on this prediction, an uncertainty indicator is generated for each unlabeled image. Images with a high uncertainty index are considered to have a high information content, and are selected for annotation. Our proposed method is based on a very simple and powerful idea: select samples near the decision boundary of the model. Experimental results on multiple datasets show that the proposed method achieves higher accuracy than conventional active learning methods on multiple tasks and up to 14 times faster execution time from 1.2 × 106 s to 8.3 × 104 s. The proposed method outperforms the current SoTA method by 1% accuracy on CIFAR-10.
一种提高标注效率的方法是主动学习。主动学习的目标是从大量未标注的图像中选择图像,这些图像的标注最能提高机器学习模型的准确性。为了选择最有信息量的未标注图像,传统方法使用具有大量计算节点和长计算时间的深度神经网络,但我们提出了一种不需要对未标注图像选择进行任何额外训练的非深度神经网络方法。该方法在标注图像上训练任务模型,然后模型预测未标注图像。基于此预测,为每个未标注图像生成一个不确定性指标。具有高不确定性指数的图像被认为具有较高的信息含量,并被选择进行标注。我们提出的方法基于一个非常简单而强大的想法:选择模型决策边界附近的样本。在多个数据集上的实验结果表明,与传统的主动学习方法相比,该方法在多个任务上的准确率更高,执行时间最快可达 1.2×10^6s 到 8.3×10^4s,快了 14 倍。在 CIFAR-10 上,该方法的准确率比当前的 SoTA 方法高出 1%。