Suppr超能文献

图像-文本手术:通过生成伪对在图像字幕中进行高效概念学习

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.

作者信息

Fu Kun, Li Jin, Jin Junqi, Zhang Changshui

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.

Abstract

Image captioning aims to generate natural language sentences to describe the salient parts of a given image. Although neural networks have recently achieved promising results, a key problem is that they can only describe concepts seen in the training image-sentence pairs. Efficient learning of novel concepts has thus been a topic of recent interest to alleviate the expensive manpower of labeling data. In this paper, we propose a novel method, Image-Text Surgery, to synthesize pseudoimage-sentence pairs. The pseudopairs are generated under the guidance of a knowledge base, with syntax from a seed data set (i.e., MSCOCO) and visual information from an existing large-scale image base (i.e., ImageNet). Via pseudodata, the captioning model learns novel concepts without any corresponding human-labeled pairs. We further introduce adaptive visual replacement, which adaptively filters unnecessary visual features in pseudodata with an attention mechanism. We evaluate our approach on a held-out subset of the MSCOCO data set. The experimental results demonstrate that the proposed approach provides significant performance improvements over state-of-the-art methods in terms of F1 score and sentence quality. An ablation study and the qualitative results further validate the effectiveness of our approach.

摘要

图像字幕旨在生成自然语言句子来描述给定图像的显著部分。尽管神经网络最近取得了令人瞩目的成果,但一个关键问题是它们只能描述在训练图像-句子对中出现的概念。因此,高效学习新概念一直是近期备受关注的话题,以减轻标注数据所需的高昂人力成本。在本文中,我们提出了一种新颖的方法——图像-文本手术,用于合成伪图像-句子对。这些伪对是在知识库的指导下生成的,其句法来自一个种子数据集(即MSCOCO),视觉信息来自一个现有的大规模图像库(即ImageNet)。通过伪数据,字幕模型无需任何相应的人工标注对就能学习新概念。我们还引入了自适应视觉替换,它通过注意力机制自适应地过滤伪数据中不必要的视觉特征。我们在MSCOCO数据集的一个留出子集中评估了我们的方法。实验结果表明,所提出的方法在F1分数和句子质量方面比现有方法有显著的性能提升。消融研究和定性结果进一步验证了我们方法的有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验