图像-文本手术：通过生成伪对在图像字幕中进行高效概念学习

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.

作者信息

Fu Kun, Li Jin, Jin Junqi, Zhang Changshui

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.

DOI:10.1109/TNNLS.2018.2813306

Abstract

Image captioning aims to generate natural language sentences to describe the salient parts of a given image. Although neural networks have recently achieved promising results, a key problem is that they can only describe concepts seen in the training image-sentence pairs. Efficient learning of novel concepts has thus been a topic of recent interest to alleviate the expensive manpower of labeling data. In this paper, we propose a novel method, Image-Text Surgery, to synthesize pseudoimage-sentence pairs. The pseudopairs are generated under the guidance of a knowledge base, with syntax from a seed data set (i.e., MSCOCO) and visual information from an existing large-scale image base (i.e., ImageNet). Via pseudodata, the captioning model learns novel concepts without any corresponding human-labeled pairs. We further introduce adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism. We evaluate our approach on a held-out subset of the MSCOCO data set. The experimental results demonstrate that the proposed approach provides significant performance improvements over state-of-the-art methods in terms of F1 score and sentence quality. An ablation study and the qualitative results further validate the effectiveness of our approach.

摘要

图像字幕旨在生成自然语言句子来描述给定图像的显著部分。尽管神经网络最近取得了令人瞩目的成果，但一个关键问题是它们只能描述在训练图像-句子对中出现的概念。因此，高效学习新概念一直是近期备受关注的话题，以减轻标注数据所需的高昂人力成本。在本文中，我们提出了一种新颖的方法——图像-文本手术，用于合成伪图像-句子对。这些伪对是在知识库的指导下生成的，其句法来自一个种子数据集（即MSCOCO），视觉信息来自一个现有的大规模图像库（即ImageNet）。通过伪数据，字幕模型无需任何相应的人工标注对就能学习新概念。我们还引入了自适应视觉替换，它通过注意力机制自适应地过滤伪数据中不必要的视觉特征。我们在MSCOCO数据集的一个留出子集中评估了我们的方法。实验结果表明，所提出的方法在F1分数和句子质量方面比现有方法有显著的性能提升。消融研究和定性结果进一步验证了我们方法的有效性。

相似文献

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.图像-文本手术：通过生成伪对在图像字幕中进行高效概念学习

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.

Cross-Domain Image Captioning via Cross-Modal Retrieval and Model Adaptation.通过跨模态检索和模型适配实现跨域图像字幕生成

IEEE Trans Image Process. 2021;30:1180-1192. doi: 10.1109/TIP.2020.3042086. Epub 2020 Dec 17.

More is Better: Precise and Detailed Image Captioning Using Online Positive Recall and Missing Concepts Mining.越多越好：使用在线正例召回和缺失概念挖掘实现精确详细的图像标注。

IEEE Trans Image Process. 2019 Jan;28(1):32-44. doi: 10.1109/TIP.2018.2855415. Epub 2018 Jul 12.

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.展示与讲述：从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.

Style-Enhanced Transformer for Image Captioning in Construction Scenes.用于建筑场景图像字幕的风格增强Transformer

Entropy (Basel). 2024 Mar 1;26(3):224. doi: 10.3390/e26030224.

Chinese Image Caption Generation via Visual Attention and Topic Modeling.基于视觉注意和主题建模的中文图像字幕生成。

IEEE Trans Cybern. 2022 Feb;52(2):1247-1257. doi: 10.1109/TCYB.2020.2997034. Epub 2022 Feb 16.

Context-Aware Visual Policy Network for Fine-Grained Image Captioning.上下文感知视觉策略网络在细粒度图像标题生成中的应用

IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):710-722. doi: 10.1109/TPAMI.2019.2909864. Epub 2022 Jan 7.

Novel concept-based image captioning models using LSTM and multi-encoder transformer architecture.使用长短期记忆网络（LSTM）和多编码器变压器架构的基于新颖概念的图像字幕模型。

Sci Rep. 2024 Sep 5;14(1):20762. doi: 10.1038/s41598-024-69664-1.

Syntax Customized Video Captioning by Imitating Exemplar Sentences.通过模仿范例句子进行语法定制化视频字幕生成。

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):10209-10221. doi: 10.1109/TPAMI.2021.3131618. Epub 2022 Nov 7.

Hierarchical LSTMs with Adaptive Attention for Visual Captioning.基于自适应注意力机制的分层长短时记忆网络的视觉描述生成

IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1112-1131. doi: 10.1109/TPAMI.2019.2894139. Epub 2019 Jan 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

图像-文本手术：通过生成伪对在图像字幕中进行高效概念学习

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献