Suppr超能文献

用于广义零样本基于草图的视觉检索的增强多模态融合

Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval.

作者信息

Jing Taotao, Xia Haifeng, Hamm Jihun, Ding Zhengming

出版信息

IEEE Trans Image Process. 2022;31:3657-3668. doi: 10.1109/TIP.2022.3173815. Epub 2022 May 26.

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) has attracted great attention recently, due to the potential application of sketch-based retrieval under zero-shot scenarios, where the categories of query sketches and gallery photos are not observed in the training stage. However, it is still under insufficient exploration for the general and practical scenario when the query sketches and gallery photos contain both seen and unseen categories. Such a problem is defined as generalized zero-shot sketch-based image retrieval (GZS-SBIR), which is the focus of this work. To this end, we propose a novel Augmented Multi-modality Fusion (AMF) framework to generalize seen concepts to unobserved ones efficiently. Specifically, a novel knowledge discovery module named cross-domain augmentation is designed in both visual and semantic space to mimic novel knowledge unseen from the training stage, which is the key to handling the GZS-SBIR challenge. Moreover, a triplet domain alignment module is proposed to couple the cross-domain distribution between photo and sketch in visual space. To enhance the robustness of our model, we explore embedding propagation to refine both visual and semantic features by removing undesired noise. Eventually, visual-semantic fusion representations are concatenated for further domain discrimination and task-specific recognition, which tend to trigger the cross-domain alignment in both visual and semantic feature space. Experimental evaluations are conducted on popular ZS-SBIR benchmarks as well as a new evaluation protocol designed for GZS-SBIR from DomainNet dataset with more diverse sub-domains, and the promising results demonstrate the superiority of the proposed solution over other baselines. The source code is available at https://github.com/scottjingtt/AMF_GZS_SBIR.git.

摘要

基于草图的零样本图像检索(ZS-SBIR)近来备受关注,这是因为在零样本场景下基于草图的检索具有潜在应用价值,即在训练阶段未见过查询草图和图库照片的类别。然而,对于查询草图和图库照片同时包含已见和未见类别的一般实际场景,该领域仍缺乏充分探索。这样的问题被定义为广义基于草图的零样本图像检索(GZS-SBIR),这正是本工作的重点。为此,我们提出了一种新颖的增强多模态融合(AMF)框架,以有效地将已见概念推广到未见概念。具体而言,在视觉和语义空间中都设计了一个名为跨域增强的新颖知识发现模块,以模拟训练阶段未见的新知识,这是应对GZS-SBIR挑战的关键。此外,还提出了一个三元组域对齐模块,以在视觉空间中耦合照片和草图之间的跨域分布。为了增强模型的鲁棒性,我们探索嵌入传播,通过去除不需要的噪声来细化视觉和语义特征。最终,将视觉-语义融合表示连接起来,用于进一步的域区分和特定任务识别,这往往会在视觉和语义特征空间中触发跨域对齐。我们在流行的ZS-SBIR基准以及为具有更多样化子域的DomainNet数据集中的GZS-SBIR设计的新评估协议上进行了实验评估,有前景的结果证明了所提出的解决方案优于其他基线。源代码可在https://github.com/scottjingtt/AMF_GZS_SBIR.git获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验