• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

精修通用检测以提取有效图像属性。

Extracting Effective Image Attributes with Refined Universal Detection.

机构信息

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Sensors (Basel). 2020 Dec 25;21(1):95. doi: 10.3390/s21010095.

DOI:10.3390/s21010095
PMID:33375715
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7795811/
Abstract

Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.

摘要

最近,包含高层语义信息的图像属性已被广泛应用于计算机视觉任务,包括视觉识别和图像字幕。现有的属性提取方法通过直接使用卷积神经网络 (CNN) 将视觉概念映射到常用词的概率上。这些方法通常存在两个主要问题。首先,不同词性 (POS) 的词以相同的方式处理,但仅通过 CNN 很难将非名词词映射到视觉区域。其次,同义词名词被视为独立且不同的词,忽略了它们的相似性。本文提出了一种新颖的改进通用检测 (RUDet) 方法来解决这两个问题。具体来说,设计了一个精炼 (RF) 模块,基于名词属性和视觉特征提取非名词的精炼属性。此外,构建了一个词树 (WT) 模块来集成同义词名词,以确保相似的词具有相似且更准确的概率。此外,采用特征增强 (FE) 模块来增强在不同尺度上挖掘不同视觉概念的能力。在大规模微软 (MS) COCO 数据集上进行的实验表明了我们提出的方法的有效性。

相似文献

1
Extracting Effective Image Attributes with Refined Universal Detection.精修通用检测以提取有效图像属性。
Sensors (Basel). 2020 Dec 25;21(1):95. doi: 10.3390/s21010095.
2
Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction.基于端到端属性检测及后续属性预测的图像字幕生成
IEEE Trans Image Process. 2020 Jan 30. doi: 10.1109/TIP.2020.2969330.
3
Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network.标题生成:一种基于改进胶囊网络的现代图像标题生成方法。
Sensors (Basel). 2022 Nov 1;22(21):8376. doi: 10.3390/s22218376.
4
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.基于属性和外部知识的图像字幕和视觉问答。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1367-1381. doi: 10.1109/TPAMI.2017.2708709. Epub 2017 May 26.
5
Visual Cluster Grounding for Image Captioning.用于图像字幕的视觉聚类基础
IEEE Trans Image Process. 2022;31:3920-3934. doi: 10.1109/TIP.2022.3177318. Epub 2022 Jun 9.
6
Dual Global Enhanced Transformer for image captioning.双全局增强型 Transformer 用于图像字幕生成。
Neural Netw. 2022 Apr;148:129-141. doi: 10.1016/j.neunet.2022.01.011. Epub 2022 Jan 21.
7
UAT: Universal Attention Transformer for Video Captioning.UAT:用于视频字幕的通用注意力转换器。
Sensors (Basel). 2022 Jun 25;22(13):4817. doi: 10.3390/s22134817.
8
Hierarchical LSTMs with Adaptive Attention for Visual Captioning.基于自适应注意力机制的分层长短时记忆网络的视觉描述生成
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1112-1131. doi: 10.1109/TPAMI.2019.2894139. Epub 2019 Jan 21.
9
Social Image Captioning: Exploring Visual Attention and User Attention.社交图像字幕生成:探索视觉注意与用户注意。
Sensors (Basel). 2018 Feb 22;18(2):646. doi: 10.3390/s18020646.
10
Dual Position Relationship Transformer for Image Captioning.用于图像字幕的双位置关系变换器
Big Data. 2022 Dec;10(6):515-527. doi: 10.1089/big.2021.0262. Epub 2022 Jan 4.

本文引用的文献

1
Image Captioning with End-to-end Attribute Detection and Subsequent Attributes Prediction.基于端到端属性检测及后续属性预测的图像字幕生成
IEEE Trans Image Process. 2020 Jan 30. doi: 10.1109/TIP.2020.2969330.
2
Focal Loss for Dense Object Detection.用于密集目标检测的焦散损失
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.
3
Mask R-CNN.Mask R-CNN。
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):386-397. doi: 10.1109/TPAMI.2018.2844175. Epub 2018 Jun 5.
4
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge.基于属性和外部知识的图像字幕和视觉问答。
IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1367-1381. doi: 10.1109/TPAMI.2017.2708709. Epub 2017 May 26.
5
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
6
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.空间金字塔池化在深度卷积网络中的视觉识别。
IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.
7
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.