Suppr超能文献

基于属性和外部知识的图像字幕和视觉问答。

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1367-1381. doi: 10.1109/TPAMI.2017.2708709. Epub 2017 May 26.

Abstract

Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.

摘要

近年来,视觉语言问题的研究进展主要得益于卷积神经网络(CNN)和循环神经网络(RNN)的结合。这种方法并没有明确表示高层语义概念,而是试图直接从图像特征转化为文本。在本文中,我们首先提出了一种将高层概念融入到成功的 CNN-RNN 方法中的方法,并表明它在图像字幕和视觉问答方面都取得了显著的提高。我们进一步表明,相同的机制可以用于整合外部知识,这对于回答高级视觉问题至关重要。具体来说,我们设计了一个视觉问答模型,该模型将图像内容的内部表示与从一般知识库中提取的信息相结合,以回答广泛的基于图像的问题。它特别允许在图像本身不包含选择正确答案所需的信息的情况下提出问题。我们的最终模型在几个主要基准数据集上的图像字幕和视觉问答方面都取得了最佳的报告结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验