Information Technology, SEGi University, Kota Damansara, Petaling Jaya, Selangor, Malaysia.
Department of Computer Science and Technology, Xinzhou Teachers University, Xinzhou, China.
PLoS One. 2021 Dec 8;16(12):e0260758. doi: 10.1371/journal.pone.0260758. eCollection 2021.
This study aims to solve the overfitting problem caused by insufficient labeled images in the automatic image annotation field. We propose a transfer learning model called CNN-2L that incorporates the label localization strategy described in this study. The model consists of an InceptionV3 network pretrained on the ImageNet dataset and a label localization algorithm. First, the pretrained InceptionV3 network extracts features from the target dataset that are used to train a specific classifier and fine-tune the entire network to obtain an optimal model. Then, the obtained model is used to derive the probabilities of the predicted labels. For this purpose, we introduce a squeeze and excitation (SE) module into the network architecture that augments the useful feature information, inhibits useless feature information, and conducts feature reweighting. Next, we perform label localization to obtain the label probabilities and determine the final label set for each image. During this process, the number of labels must be determined. The optimal K value is obtained experimentally and used to determine the number of predicted labels, thereby solving the empty label set problem that occurs when the predicted label values of images are below a fixed threshold. Experiments on the Corel5k multilabel image dataset verify that CNN-2L improves the labeling precision by 18% and 15% compared with the traditional multiple-Bernoulli relevance model (MBRM) and joint equal contribution (JEC) algorithms, respectively, and it improves the recall by 6% compared with JEC. Additionally, it improves the precision by 20% and 11% compared with the deep learning methods Weight-KNN and adaptive hypergraph learning (AHL), respectively. Although CNN-2L fails to improve the recall compared with the semantic extension model (SEM), it improves the comprehensive index of the F1 value by 1%. The experimental results reveal that the proposed transfer learning model based on a label localization strategy is effective for automatic image annotation and substantially boosts the multilabel image annotation performance.
这项研究旨在解决自动图像标注领域中因标注图像不足而导致的过拟合问题。我们提出了一种称为 CNN-2L 的迁移学习模型,该模型结合了本研究中描述的标签定位策略。该模型由在 ImageNet 数据集上预训练的 InceptionV3 网络和标签定位算法组成。首先,预训练的 InceptionV3 网络从目标数据集提取特征,用于训练特定的分类器,并微调整个网络以获得最佳模型。然后,使用获得的模型来推导出预测标签的概率。为此,我们在网络架构中引入了一个挤压激励 (SE) 模块,该模块增强了有用的特征信息,抑制了无用的特征信息,并进行了特征重新加权。接下来,我们执行标签定位以获得标签概率,并确定每张图像的最终标签集。在此过程中,必须确定标签的数量。通过实验获得最佳 K 值,并用于确定预测标签的数量,从而解决了当图像的预测标签值低于固定阈值时出现的空标签集问题。在 Corel5k 多标签图像数据集上的实验验证了,与传统的多伯努利关联模型(MBRM)和联合均等贡献(JEC)算法相比,CNN-2L 分别提高了 18%和 15%的标注精度,与 JEC 相比,召回率提高了 6%。此外,与深度学习方法 Weight-KNN 和自适应超图学习(AHL)相比,精度分别提高了 20%和 11%。虽然 CNN-2L 在与语义扩展模型(SEM)的召回率相比没有提高,但它将 F1 值的综合指标提高了 1%。实验结果表明,所提出的基于标签定位策略的迁移学习模型对于自动图像标注是有效的,并大大提高了多标签图像标注性能。