Liu Fang, Deng Xiaoming, Zou Changqing, Lai Yu-Kun, Chen Keqi, Zuo Ran, Ma Cuixia, Liu Yong-Jin, Wang Hongan
IEEE Trans Image Process. 2022;31:3737-3751. doi: 10.1109/TIP.2022.3175403. Epub 2022 May 26.
Sketch-based image retrieval (SBIR) is a long-standing research topic in computer vision. Existing methods mainly focus on category-level or instance-level image retrieval. This paper investigates the fine-grained scene-level SBIR problem where a free-hand sketch depicting a scene is used to retrieve desired images. This problem is useful yet challenging mainly because of two entangled facts: 1) achieving an effective representation of the input query data and scene-level images is difficult as it requires to model the information across multiple modalities such as object layout, relative size and visual appearances, and 2) there is a great domain gap between the query sketch input and target images. We present SceneSketcher-v2, a Graph Convolutional Network (GCN) based architecture to address these challenges. SceneSketcher-v2 employs a carefully designed graph convolution network to fuse the multi-modality information in the query sketch and target images and uses a triplet training process and end-to-end training manner to alleviate the domain gap. Extensive experiments demonstrate SceneSketcher-v2 outperforms state-of-the-art scene-level SBIR models with a significant margin.
基于草图的图像检索(SBIR)是计算机视觉中一个长期存在的研究课题。现有方法主要集中在类别级或实例级图像检索。本文研究细粒度场景级SBIR问题,即使用描绘场景的手绘草图来检索所需图像。这个问题很有用但也具有挑战性,主要是由于两个相互交织的事实:1)难以实现对输入查询数据和场景级图像的有效表示,因为这需要对跨多种模态的信息进行建模,如对象布局、相对大小和视觉外观;2)查询草图输入和目标图像之间存在很大的领域差距。我们提出了SceneSketcher-v2,一种基于图卷积网络(GCN)的架构来应对这些挑战。SceneSketcher-v2采用精心设计的图卷积网络来融合查询草图和目标图像中的多模态信息,并使用三元组训练过程和端到端训练方式来缩小领域差距。大量实验表明,SceneSketcher-v2显著优于现有最先进的场景级SBIR模型。