Suppr超能文献

基于反向归纳的深度图像搜索。

Backward induction-based deep image search.

机构信息

Department of Industrial Engineering, Yonsei University, Seoul, Republic of Korea.

出版信息

PLoS One. 2024 Sep 9;19(9):e0310098. doi: 10.1371/journal.pone.0310098. eCollection 2024.

Abstract

Conditional image retrieval (CIR), which involves retrieving images by a query image along with user-specified conditions, is essential in computer vision research for efficient image search and automated image analysis. The existing approaches, such as composed image retrieval (CoIR) methods, have been actively studied. However, these methods face challenges as they require either a triplet dataset or richly annotated image-text pairs, which are expensive to obtain. In this work, we demonstrate that CIR at the image-level concept can be achieved using an inverse mapping approach that explores the model's inductive knowledge. Our proposed CIR method, called Backward Search, updates the query embedding to conform to the condition. Specifically, the embedding of the query image is updated by predicting the probability of the label and minimizing the difference from the condition label. This enables CIR with image-level concepts while preserving the context of the query. In this paper, we introduce the Backward Search method that enables single and multi-conditional image retrieval. Moreover, we efficiently reduce the computation time by distilling the knowledge. We conduct experiments using the WikiArt, aPY, and CUB benchmark datasets. The proposed method achieves an average mAP@10 of 0.541 on the datasets, demonstrating a marked improvement compared to the CoIR methods in our comparative experiments. Furthermore, by employing knowledge distillation with the Backward Search model as the teacher, the student model achieves a significant reduction in computation time, up to 160 times faster with only a slight decrease in performance. The implementation of our method is available at the following URL: https://github.com/dhlee-work/BackwardSearch.

摘要

条件图像检索(CIR),即通过查询图像和用户指定的条件检索图像,是计算机视觉研究中实现高效图像搜索和自动化图像分析的关键。目前已经有很多研究人员积极研究了组合图像检索(CoIR)方法,但这些方法面临着挑战,因为它们要么需要三元组数据集,要么需要大量注释的图像-文本对,而这些数据的获取成本很高。在这项工作中,我们证明了可以使用探索模型归纳知识的逆映射方法来实现图像级概念的 CIR。我们提出的 CIR 方法称为反向搜索,通过更新查询嵌入以符合条件来实现 CIR。具体来说,通过预测标签的概率并最小化与条件标签的差异来更新查询图像的嵌入。这使得可以在保留查询上下文的同时进行图像级概念的 CIR。在本文中,我们介绍了 Backward Search 方法,该方法可以实现单条件和多条件图像检索。此外,我们通过知识蒸馏有效地减少了计算时间。我们在 WikiArt、aPY 和 CUB 基准数据集上进行了实验。所提出的方法在这些数据集上的平均 mAP@10 达到 0.541,与我们的对比实验中的 CoIR 方法相比,有显著的改进。此外,通过使用 Backward Search 模型作为教师进行知识蒸馏,学生模型的计算时间可以显著减少,最快可以快 160 倍,而性能仅略有下降。我们的方法的实现可以在以下网址获得:https://github.com/dhlee-work/BackwardSearch。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/687d/11383237/ed1aa41dddbe/pone.0310098.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验