• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于无配对图像字幕的联合语义优化的纪念生成对抗网络

Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning.

作者信息

Song Peipei, Guo Dan, Zhou Jinxing, Xu Mingliang, Wang Meng

出版信息

IEEE Trans Cybern. 2023 Jul;53(7):4388-4399. doi: 10.1109/TCYB.2022.3175012. Epub 2023 Jun 15.

DOI:10.1109/TCYB.2022.3175012
PMID:35635832
Abstract

Most works of image captioning are implemented under the full supervision of paired image-caption data. Limited to expensive cost of data collection, the task of unpaired image captioning has attracted researchers' attention. In this article, we propose a novel memorial GAN (MemGAN) with the joint semantic optimization for unpaired image captioning. The core idea is to explore implicit semantic correlation between disjointed images and sentences through building a multimodal semantic-aware space (SAS). Concretely, each modality is mapped into a unified multimodal SAS, where SAS includes the semantic vectors of image I , visual concepts O , unpaired sentence S , and the generated caption C . We adopt the memory unit based on multihead attention and relational gate as a backbone to preserve and transit crucial multimodal semantics in the SAS for image caption generation and sentence reconstruction. Then, the memory unit is embedded into a GAN framework to exploit the semantic similarity and relevance in SAS, that is, imposing a joint semantic-aware optimization on SAS without supervision clues. To summarize, the proposed MemGAN learns the latent semantic relevance of SAS's multimodalities in an adversarial manner. Extensive experiments and qualitative results demonstrate the effectiveness of MemGAN, achieving improvements over state of the arts on unpaired image captioning benchmarks.

摘要

大多数图像字幕作品都是在成对的图像-字幕数据的完全监督下实现的。由于数据收集成本高昂,未配对图像字幕任务引起了研究人员的关注。在本文中,我们提出了一种新颖的记忆生成对抗网络(MemGAN),用于未配对图像字幕的联合语义优化。其核心思想是通过构建多模态语义感知空间(SAS)来探索不相关图像和句子之间的隐式语义关联。具体来说,每个模态都被映射到一个统一的多模态SAS中,其中SAS包括图像I的语义向量、视觉概念O、未配对句子S和生成的字幕C。我们采用基于多头注意力和关系门的记忆单元作为主干,以在SAS中保存和传递关键的多模态语义,用于图像字幕生成和句子重建。然后,将记忆单元嵌入到生成对抗网络框架中,以利用SAS中的语义相似性和相关性,即在没有监督线索的情况下对SAS进行联合语义感知优化。总之,所提出的MemGAN以对抗的方式学习SAS多模态的潜在语义相关性。大量实验和定性结果证明了MemGAN的有效性,在未配对图像字幕基准上比现有技术有了改进。

相似文献

1
Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning.用于无配对图像字幕的联合语义优化的纪念生成对抗网络
IEEE Trans Cybern. 2023 Jul;53(7):4388-4399. doi: 10.1109/TCYB.2022.3175012. Epub 2023 Jun 15.
2
Mining core information by evaluating semantic importance for unpaired image captioning.通过评估语义重要性来挖掘未配对图像字幕的核心信息。
Neural Netw. 2024 Nov;179:106519. doi: 10.1016/j.neunet.2024.106519. Epub 2024 Jul 9.
3
Center-enhanced video captioning model with multimodal semantic alignment.带多模态语义对齐的中心增强视频字幕模型。
Neural Netw. 2024 Dec;180:106744. doi: 10.1016/j.neunet.2024.106744. Epub 2024 Sep 18.
4
Syntax Customized Video Captioning by Imitating Exemplar Sentences.通过模仿范例句子进行语法定制化视频字幕生成。
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):10209-10221. doi: 10.1109/TPAMI.2021.3131618. Epub 2022 Nov 7.
5
Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction.基于端到端属性检测及后续属性预测的图像字幕生成
IEEE Trans Image Process. 2020 Jan 30. doi: 10.1109/TIP.2020.2969330.
6
Context-Fused Guidance for Image Captioning Using Sequence-Level Training.基于序列级训练的上下文融合图像字幕生成
Comput Intell Neurosci. 2022 Jan 5;2022:9743123. doi: 10.1155/2022/9743123. eCollection 2022.
7
Re-Caption: Saliency-Enhanced Image Captioning through Two-Phase Learning.重新标题:通过两阶段学习实现显著增强的图像字幕生成
IEEE Trans Image Process. 2019 Jul 17. doi: 10.1109/TIP.2019.2928144.
8
Automatically Generating Natural Language Descriptions of Images by a Deep Hierarchical Framework.基于深度分层框架的图像自然语言描述自动生成
IEEE Trans Cybern. 2022 Aug;52(8):7441-7452. doi: 10.1109/TCYB.2020.3041595. Epub 2022 Jul 19.
9
LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and text.LCM-Captioner:一种轻量级基于文本的图像字幕生成方法,具有视觉和文本之间的协作机制。
Neural Netw. 2023 May;162:318-329. doi: 10.1016/j.neunet.2023.03.010. Epub 2023 Mar 11.
10
Exploiting Cross-Modal Prediction and Relation Consistency for Semisupervised Image Captioning.利用跨模态预测和关系一致性进行半监督图像字幕生成
IEEE Trans Cybern. 2024 Feb;54(2):890-902. doi: 10.1109/TCYB.2022.3156367. Epub 2024 Jan 17.