Department of Industrial Engineering, Yonsei University, Seoul, Republic of Korea.
PLoS One. 2024 Sep 9;19(9):e0310098. doi: 10.1371/journal.pone.0310098. eCollection 2024.
Conditional image retrieval (CIR), which involves retrieving images by a query image along with user-specified conditions, is essential in computer vision research for efficient image search and automated image analysis. The existing approaches, such as composed image retrieval (CoIR) methods, have been actively studied. However, these methods face challenges as they require either a triplet dataset or richly annotated image-text pairs, which are expensive to obtain. In this work, we demonstrate that CIR at the image-level concept can be achieved using an inverse mapping approach that explores the model's inductive knowledge. Our proposed CIR method, called Backward Search, updates the query embedding to conform to the condition. Specifically, the embedding of the query image is updated by predicting the probability of the label and minimizing the difference from the condition label. This enables CIR with image-level concepts while preserving the context of the query. In this paper, we introduce the Backward Search method that enables single and multi-conditional image retrieval. Moreover, we efficiently reduce the computation time by distilling the knowledge. We conduct experiments using the WikiArt, aPY, and CUB benchmark datasets. The proposed method achieves an average mAP@10 of 0.541 on the datasets, demonstrating a marked improvement compared to the CoIR methods in our comparative experiments. Furthermore, by employing knowledge distillation with the Backward Search model as the teacher, the student model achieves a significant reduction in computation time, up to 160 times faster with only a slight decrease in performance. The implementation of our method is available at the following URL: https://github.com/dhlee-work/BackwardSearch.
条件图像检索(CIR),即通过查询图像和用户指定的条件检索图像,是计算机视觉研究中实现高效图像搜索和自动化图像分析的关键。目前已经有很多研究人员积极研究了组合图像检索(CoIR)方法,但这些方法面临着挑战,因为它们要么需要三元组数据集,要么需要大量注释的图像-文本对,而这些数据的获取成本很高。在这项工作中,我们证明了可以使用探索模型归纳知识的逆映射方法来实现图像级概念的 CIR。我们提出的 CIR 方法称为反向搜索,通过更新查询嵌入以符合条件来实现 CIR。具体来说,通过预测标签的概率并最小化与条件标签的差异来更新查询图像的嵌入。这使得可以在保留查询上下文的同时进行图像级概念的 CIR。在本文中,我们介绍了 Backward Search 方法,该方法可以实现单条件和多条件图像检索。此外,我们通过知识蒸馏有效地减少了计算时间。我们在 WikiArt、aPY 和 CUB 基准数据集上进行了实验。所提出的方法在这些数据集上的平均 mAP@10 达到 0.541,与我们的对比实验中的 CoIR 方法相比,有显著的改进。此外,通过使用 Backward Search 模型作为教师进行知识蒸馏,学生模型的计算时间可以显著减少,最快可以快 160 倍,而性能仅略有下降。我们的方法的实现可以在以下网址获得:https://github.com/dhlee-work/BackwardSearch。