基于属性和外部知识的图像字幕和视觉问答。

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1367-1381. doi: 10.1109/TPAMI.2017.2708709. Epub 2017 May 26.

DOI:10.1109/TPAMI.2017.2708709

Abstract

Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.

摘要

近年来，视觉语言问题的研究进展主要得益于卷积神经网络（CNN）和循环神经网络（RNN）的结合。这种方法并没有明确表示高层语义概念，而是试图直接从图像特征转化为文本。在本文中，我们首先提出了一种将高层概念融入到成功的 CNN-RNN 方法中的方法，并表明它在图像字幕和视觉问答方面都取得了显著的提高。我们进一步表明，相同的机制可以用于整合外部知识，这对于回答高级视觉问题至关重要。具体来说，我们设计了一个视觉问答模型，该模型将图像内容的内部表示与从一般知识库中提取的信息相结合，以回答广泛的基于图像的问题。它特别允许在图像本身不包含选择正确答案所需的信息的情况下提出问题。我们的最终模型在几个主要基准数据集上的图像字幕和视觉问答方面都取得了最佳的报告结果。

相似文献

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.基于属性和外部知识的图像字幕和视觉问答。

IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1367-1381. doi: 10.1109/TPAMI.2017.2708709. Epub 2017 May 26.

Insights into Object Semantics: Leveraging Transformer Networks for Advanced Image Captioning.深入理解对象语义：利用Transformer网络实现高级图像字幕生成

Sensors (Basel). 2024 Mar 11;24(6):1796. doi: 10.3390/s24061796.

Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network.标题生成：一种基于改进胶囊网络的现代图像标题生成方法。

Sensors (Basel). 2022 Nov 1;22(21):8376. doi: 10.3390/s22218376.

From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning.从确定性到生成式：用于视频字幕的多模态随机循环神经网络

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):3047-3058. doi: 10.1109/TNNLS.2018.2851077. Epub 2018 Aug 16.

Vision-to-Language Tasks Based on Attributes and Attention Mechanism.基于属性和注意力机制的视觉-语言任务。

IEEE Trans Cybern. 2021 Feb;51(2):913-926. doi: 10.1109/TCYB.2019.2914351. Epub 2021 Jan 15.

Hierarchical Representation Network With Auxiliary Tasks for Video Captioning and Video Question Answering.用于视频字幕和视频问答的带有辅助任务的分层表示网络

IEEE Trans Image Process. 2022;31:202-215. doi: 10.1109/TIP.2021.3120867. Epub 2021 Dec 3.

FVQA: Fact-based Visual Question Answering.基于事实的视觉问答（FVQA）。

IEEE Trans Pattern Anal Mach Intell. 2018 Oct;40(10):2413-2427. doi: 10.1109/TPAMI.2017.2754246. Epub 2017 Sep 19.

Image-Text Surgery: Efficient Concept Learning in Image Captioning by Generating Pseudopairs.图像-文本手术：通过生成伪对在图像字幕中进行高效概念学习

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5910-5921. doi: 10.1109/TNNLS.2018.2813306. Epub 2018 Apr 5.

Learning Dual Encoding Model for Adaptive Visual Understanding in Visual Dialogue.学习双编码模型以实现视觉对话中的自适应视觉理解。

IEEE Trans Image Process. 2021;30:220-233. doi: 10.1109/TIP.2020.3034494. Epub 2020 Nov 18.

Learning Contextual Dependence With Convolutional Hierarchical Recurrent Neural Networks.用卷积层次递归神经网络学习上下文相关性。

IEEE Trans Image Process. 2016 Jul;25(7):2983-2996. doi: 10.1109/TIP.2016.2548241.

引用本文的文献

Supervised Deep Learning Techniques for Image Description: A Systematic Review.用于图像描述的监督式深度学习技术：一项系统综述。

Entropy (Basel). 2023 Mar 23;25(4):553. doi: 10.3390/e25040553.

Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network.标题生成：一种基于改进胶囊网络的现代图像标题生成方法。

Sensors (Basel). 2022 Nov 1;22(21):8376. doi: 10.3390/s22218376.

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images.深度学习与词嵌入相结合生成图像字幕：地质岩石图像的图像字幕解决方案

J Imaging. 2022 Oct 22;8(11):294. doi: 10.3390/jimaging8110294.

Single-Shot Object Detection via Feature Enhancement and Channel Attention.基于特征增强与通道注意力的单阶段目标检测

Sensors (Basel). 2022 Sep 10;22(18):6857. doi: 10.3390/s22186857.

Exploiting Concepts of Instance Segmentation to Boost Detection in Challenging Environments.利用实例分割的概念来提高挑战性环境下的检测能力。

Sensors (Basel). 2022 May 12;22(10):3703. doi: 10.3390/s22103703.

Deep Modular Bilinear Attention Network for Visual Question Answering.深度模块化双线性注意力网络的视觉问答。

Sensors (Basel). 2022 Jan 28;22(3):1045. doi: 10.3390/s22031045.

Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments.基于深度学习的挑战性环境目标检测的调查与性能分析。

Sensors (Basel). 2021 Jul 28;21(15):5116. doi: 10.3390/s21155116.

Extracting Effective Image Attributes with Refined Universal Detection.精修通用检测以提取有效图像属性。

Sensors (Basel). 2020 Dec 25;21(1):95. doi: 10.3390/s21010095.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于属性和外部知识的图像字幕和视觉问答。

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献