School of Computer Science & Technology, Taiyuan University of Science and Technology, Taiyuan, China.
Department of Computer Science & Technology, Xinzhou Teachers University, Xinzhou, China.
PLoS One. 2020 Sep 23;15(9):e0238956. doi: 10.1371/journal.pone.0238956. eCollection 2020.
In this study, a convolutional neural network with threshold optimization (CNN-THOP) is proposed to solve the issue of overlabeling or downlabeling arising during the multilabel image annotation process in the use of a ranking function for label annotation along with prediction probability. This model fuses the threshold optimization algorithm to the CNN structure. First, an optimal model trained by the CNN is used to predict the test set images, and batch normalization (BN) is added to the CNN structure to effectively accelerate the convergence speed and obtain a group of prediction probabilities. Second, threshold optimization is performed on the obtained prediction probability to derive an optimal threshold for each class of labels to form a group of optimal thresholds. When the prediction probability for this class of labels is greater than or equal to the corresponding optimal threshold, this class of labels is used as the annotation result for the image. During the annotation process, the multilabel annotation for the image to be annotated is realized by loading the optimal model and the optimal threshold. Verification experiments are performed on the MIML, COREL5K, and MSRC datasets. Compared with the MBRM, the CNN-THOP increases the average precision on MIML, COREL5K, and MSRC by 27%, 28% and 33%, respectively. Compared with the E2E-DCNN, the CNN-THOP increases the average recall rate by 3% on both COREL5K and MSRC. The most precise annotation effect for CNN-THOP is observed on the MIML dataset, with a complete matching degree reaching 64.8%.
在这项研究中,提出了一种具有阈值优化的卷积神经网络(CNN-THOP),以解决在使用排序函数进行标签注释以及预测概率的多标签图像注释过程中出现的过度标记或欠标记问题。该模型将阈值优化算法融合到 CNN 结构中。首先,使用经过 CNN 训练的最优模型对测试集图像进行预测,并在 CNN 结构中添加批量归一化(BN),以有效加快收敛速度并获得一组预测概率。其次,对获得的预测概率进行阈值优化,为每个标签类导出最优阈值,形成一组最优阈值。当该类标签的预测概率大于或等于相应的最优阈值时,将该类标签用作图像的注释结果。在注释过程中,通过加载最优模型和最优阈值来实现要注释的图像的多标签注释。在 MIML、COREL5K 和 MSRC 数据集上进行验证实验。与 MBRM 相比,CNN-THOP 分别将 MIML、COREL5K 和 MSRC 的平均精度提高了 27%、28%和 33%。与 E2E-DCNN 相比,CNN-THOP 在 COREL5K 和 MSRC 上的平均召回率分别提高了 3%。CNN-THOP 在 MIML 数据集上观察到最精确的注释效果,完全匹配度达到 64.8%。