Suppr超能文献

精修通用检测以提取有效图像属性。

Extracting Effective Image Attributes with Refined Universal Detection.

机构信息

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Sensors (Basel). 2020 Dec 25;21(1):95. doi: 10.3390/s21010095.

Abstract

Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.

摘要

最近,包含高层语义信息的图像属性已被广泛应用于计算机视觉任务,包括视觉识别和图像字幕。现有的属性提取方法通过直接使用卷积神经网络 (CNN) 将视觉概念映射到常用词的概率上。这些方法通常存在两个主要问题。首先,不同词性 (POS) 的词以相同的方式处理,但仅通过 CNN 很难将非名词词映射到视觉区域。其次,同义词名词被视为独立且不同的词,忽略了它们的相似性。本文提出了一种新颖的改进通用检测 (RUDet) 方法来解决这两个问题。具体来说,设计了一个精炼 (RF) 模块,基于名词属性和视觉特征提取非名词的精炼属性。此外,构建了一个词树 (WT) 模块来集成同义词名词,以确保相似的词具有相似且更准确的概率。此外,采用特征增强 (FE) 模块来增强在不同尺度上挖掘不同视觉概念的能力。在大规模微软 (MS) COCO 数据集上进行的实验表明了我们提出的方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验