Zhang Gangjian, Wei Shikui, Pang Huaxin, Qiu Shuang, Zhao Yao
IEEE Trans Image Process. 2022;31:5976-5988. doi: 10.1109/TIP.2022.3204213. Epub 2022 Sep 15.
Composed image retrieval aims at retrieving the desired images, given a reference image and a text piece. To handle this task, two important subprocesses should be modeled reasonably. One is to erase irrelated details of the reference image against the text piece, and the other is to replenish the desired details in the image against the text piece. Nowadays, the existing methods neglect to distinguish between the two subprocesses and implicitly put them together to solve the composed image retrieval task. To explicitly and orderly model the two subprocesses of the task, we propose a novel composed image retrieval method which contains three key components, i.e., Multi-semantic Dynamic Suppression module (MDS), Text-semantic Complementary Selection module (TCS), and Semantic Space Alignment constraints (SSA). Concretely, MDS is to erase irrelated details of the reference image by suppressing its semantic features. TCS aims to select and enhance the semantic features of the text piece and then replenish them to the reference image. In the end, to facilitate the erasure and replenishment subprocesses, SSA aligns the semantics of the two modality features in the final space. Extensive experiments on three benchmark datasets (Shoes, FashionIQ, and Fashion200K) show the superior performance of our approach against state-of-the-art methods.
合成图像检索旨在根据参考图像和一段文本检索出所需图像。为处理此任务,应合理建模两个重要的子过程。一个是根据文本去除参考图像中不相关的细节,另一个是根据文本在图像中补充所需的细节。如今,现有方法忽略区分这两个子过程,而是隐含地将它们放在一起解决合成图像检索任务。为明确且有序地对该任务的两个子过程进行建模,我们提出一种新颖的合成图像检索方法,它包含三个关键组件,即多语义动态抑制模块(MDS)、文本语义互补选择模块(TCS)和语义空间对齐约束(SSA)。具体而言,MDS通过抑制参考图像的语义特征来去除其不相关的细节。TCS旨在选择并增强文本的语义特征,然后将其补充到参考图像中。最后,为便于进行去除和补充子过程,SSA在最终空间中对齐两个模态特征的语义。在三个基准数据集(鞋子、FashionIQ和Fashion200K)上进行的大量实验表明,我们的方法相对于现有方法具有卓越的性能。