• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

图像-文本手术:通过生成伪对在图像字幕中进行高效概念学习

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.

作者信息

Fu Kun, Li Jin, Jin Junqi, Zhang Changshui

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.

DOI:10.1109/TNNLS.2018.2813306
PMID:29993667
Abstract

Image captioning aims to generate natural language sentences to describe the salient parts of a given image. Although neural networks have recently achieved promising results, a key problem is that they can only describe concepts seen in the training image-sentence pairs. Efficient learning of novel concepts has thus been a topic of recent interest to alleviate the expensive manpower of labeling data. In this paper, we propose a novel method, Image-Text Surgery, to synthesize pseudoimage-sentence pairs. The pseudopairs are generated under the guidance of a knowledge base, with syntax from a seed data set (i.e., MSCOCO) and visual information from an existing large-scale image base (i.e., ImageNet). Via pseudodata, the captioning model learns novel concepts without any corresponding human-labeled pairs. We further introduce adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism. We evaluate our approach on a held-out subset of the MSCOCO data set. The experimental results demonstrate that the proposed approach provides significant performance improvements over state-of-the-art methods in terms of F1 score and sentence quality. An ablation study and the qualitative results further validate the effectiveness of our approach.

摘要

图像字幕旨在生成自然语言句子来描述给定图像的显著部分。尽管神经网络最近取得了令人瞩目的成果,但一个关键问题是它们只能描述在训练图像-句子对中出现的概念。因此,高效学习新概念一直是近期备受关注的话题,以减轻标注数据所需的高昂人力成本。在本文中,我们提出了一种新颖的方法——图像-文本手术,用于合成伪图像-句子对。这些伪对是在知识库的指导下生成的,其句法来自一个种子数据集(即MSCOCO),视觉信息来自一个现有的大规模图像库(即ImageNet)。通过伪数据,字幕模型无需任何相应的人工标注对就能学习新概念。我们还引入了自适应视觉替换,它通过注意力机制自适应地过滤伪数据中不必要的视觉特征。我们在MSCOCO数据集的一个留出子集中评估了我们的方法。实验结果表明,所提出的方法在F1分数和句子质量方面比现有方法有显著的性能提升。消融研究和定性结果进一步验证了我们方法的有效性。

相似文献

1
Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.图像-文本手术:通过生成伪对在图像字幕中进行高效概念学习
IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.
2
Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成
IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.
3
More is Better: Precise and Detailed Image Captioning Using Online Positive Recall and Missing Concepts Mining.越多越好:使用在线正例召回和缺失概念挖掘实现精确详细的图像标注。
IEEE Trans Image Process. 2019 Jan;28(1):32-44. doi: 10.1109/TIP.2018.2855415. Epub 2018 Jul 12.
4
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.展示与讲述:从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.
5
Style-Enhanced Transformer for Image Captioning in Construction Scenes.用于建筑场景图像字幕的风格增强Transformer
Entropy (Basel). 2024 Mar 1;26(3):224. doi: 10.3390/e26030224.
6
Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。
IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.
7
Context-Aware Visual Policy Network for Fine-Grained Image Captioning.上下文感知视觉策略网络在细粒度图像标题生成中的应用
IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):710-722. doi: 10.1109/TPAMI.2019.2909864. Epub 2022 Jan 7.
8
Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture.使用长短期记忆网络(LSTM)和多编码器变压器架构的基于新颖概念的图像字幕模型。
Sci Rep. 2024 Sep 5;14(1):20762. doi: 10.1038/s41598-024-69664-1.
9
Syntax Customized Video Captioning by Imitating Exemplar Sentences.通过模仿范例句子进行语法定制化视频字幕生成。
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):10209-10221. doi: 10.1109/TPAMI.2021.3131618. Epub 2022 Nov 7.
10
Hierarchical LSTMs with Adaptive Attention for Visual Captioning.基于自适应注意力机制的分层长短时记忆网络的视觉描述生成
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1112-1131. doi: 10.1109/TPAMI.2019.2894139. Epub 2019 Jan 21.