• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于跨模态检索的语义解缠

Semantics Disentangling for Cross-Modal Retrieval.

作者信息

Wang Zheng, Xu Xing, Wei Jiwei, Xie Ning, Yang Yang, Shen Heng Tao

出版信息

IEEE Trans Image Process. 2024;33:2226-2237. doi: 10.1109/TIP.2024.3374111. Epub 2024 Mar 25.

DOI:10.1109/TIP.2024.3374111
PMID:38470583
Abstract

Cross-modal retrieval (e.g., query a given image to obtain a semantically similar sentence, and vice versa) is an important but challenging task, as the heterogeneous gap and inconsistent distributions exist between different modalities. The dominant approaches struggle to bridge the heterogeneity by capturing the common representations among heterogeneous data in a constructed subspace which can reflect the semantic closeness. However, insufficient consideration is taken into the fact that learned latent representations are actually heavily entangled with those semantic-unrelated features, which obviously further compounds the challenges of cross-modal retrieval. To alleviate the difficulty, this work makes an assumption that the data are jointly characterized by two independent features: semantic-shared and semantic-unrelated representations. The former presents characteristics of consistent semantics shared by different modalities, while the latter reflects the characteristics with respect to the modality yet unrelated to semantics, such as background, illumination, and other low-level information. Therefore, this paper aims to disentangle the shared semantics from the entangled features, andthus the purer semantic representation can promote the closeness of paired data. Specifically, this paper designs a novel Semantics Disentangling approach for Cross-Modal Retrieval (termed as SDCMR) to explicitly decouple the two different features based on variational auto-encoder. Next, the reconstruction is performed by exchanging shared semantics to ensure the learning of semantic consistency. Moreover, a dual adversarial mechanism is designed to disentangle the two independent features via a pushing-and-pulling strategy. Comprehensive experiments on four widely used datasets demonstrate the effectiveness and superiority of the proposed SDCMR method by achieving a new bar on performance when compared against 15 state-of-the-art methods.

摘要

跨模态检索(例如,查询给定图像以获得语义相似的句子,反之亦然)是一项重要但具有挑战性的任务,因为不同模态之间存在异构差距和不一致的分布。主流方法通过在能够反映语义接近度的构造子空间中捕获异构数据之间的共同表示来努力弥合异构性。然而,人们没有充分考虑到这样一个事实,即学习到的潜在表示实际上与那些语义无关的特征严重纠缠在一起,这显然进一步加剧了跨模态检索的挑战。为了缓解这一困难,这项工作做出了一个假设,即数据由两个独立的特征共同表征:语义共享和语义无关的表示。前者呈现出不同模态共享的一致语义特征,而后者反映了与模态相关但与语义无关的特征,例如背景、光照和其他低级信息。因此,本文旨在从纠缠特征中分离出共享语义,从而更纯净的语义表示可以促进配对数据的接近度。具体来说,本文设计了一种新颖的用于跨模态检索的语义解缠方法(称为SDCMR),以基于变分自编码器显式地解耦这两个不同的特征。接下来,通过交换共享语义来进行重构,以确保语义一致性的学习。此外,设计了一种对偶对抗机制,通过推拉策略来解缠这两个独立的特征。在四个广泛使用的数据集上进行的综合实验表明,与15种最先进的方法相比,所提出的SDCMR方法通过达到新的性能标准,证明了其有效性和优越性。

相似文献

1
Semantics Disentangling for Cross-Modal Retrieval.用于跨模态检索的语义解缠
IEEE Trans Image Process. 2024;33:2226-2237. doi: 10.1109/TIP.2024.3374111. Epub 2024 Mar 25.
2
Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval.用于零样本跨模态检索的具有自监督的三元对抗网络。
IEEE Trans Cybern. 2020 Jun;50(6):2400-2413. doi: 10.1109/TCYB.2019.2928180. Epub 2019 Jul 24.
3
Learning Cross-Modal Common Representations by Private-Shared Subspaces Separation.通过私有共享子空间分离学习跨模态公共表示。
IEEE Trans Cybern. 2022 May;52(5):3261-3275. doi: 10.1109/TCYB.2020.3009004. Epub 2022 May 19.
4
Structure-aware contrastive hashing for unsupervised cross-modal retrieval.用于无监督跨模态检索的结构感知对比哈希
Neural Netw. 2024 Jun;174:106211. doi: 10.1016/j.neunet.2024.106211. Epub 2024 Feb 27.
5
MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval.MHTN:用于跨模态检索的模态对抗混合转移网络。
IEEE Trans Cybern. 2020 Mar;50(3):1047-1059. doi: 10.1109/TCYB.2018.2879846. Epub 2018 Dec 5.
6
Modality independent adversarial network for generalized zero shot image classification.模态无关对抗网络的广义零样本图像分类。
Neural Netw. 2021 Feb;134:11-22. doi: 10.1016/j.neunet.2020.11.007. Epub 2020 Nov 21.
7
Keyword-Based Diverse Image Retrieval With Variational Multiple Instance Graph.基于变分多实例图的基于关键词的多样图像检索
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10528-10537. doi: 10.1109/TNNLS.2022.3168431. Epub 2023 Nov 30.
8
Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval.用于跨模态检索的多任务一致性保持对抗哈希
IEEE Trans Image Process. 2020 Jan 9. doi: 10.1109/TIP.2020.2963957.
9
Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量
IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.
10
Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal Retrieval.基于语义索引和内容编码的无监督双深度哈希用于跨模态检索
IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):387-399. doi: 10.1109/TPAMI.2024.3467130. Epub 2024 Dec 4.

引用本文的文献

1
Visual delta generation with large multi-modal models enhances composed image retrieval using unlabeled data.使用大型多模态模型生成视觉增量可增强利用未标记数据的合成图像检索。
Sci Rep. 2025 Jul 28;15(1):27463. doi: 10.1038/s41598-025-07798-6.