Suppr超能文献

多模态多尺度深度学习在大规模图像标注中的应用。

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation.

出版信息

IEEE Trans Image Process. 2019 Apr;28(4):1720-1731. doi: 10.1109/TIP.2018.2881928. Epub 2018 Nov 16.

Abstract

Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept and 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed, which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets, and the results show that our method significantly outperforms the state of the art.

摘要

图像标注旨在为给定的图像标注数量可变的类别标签,这些标签对应于各种视觉概念。在本文中,我们解决了大规模图像标注中的两个主要问题:1)如何学习适合预测从对象、场景到抽象概念等各种视觉概念的丰富特征表示;2)如何为图像标注最佳数量的类别标签。为了解决第一个问题,我们提出了一种新的多尺度深度模型,用于提取丰富和有鉴别力的特征,能够表示广泛的视觉概念。具体来说,提出了一种新的双分支深度神经网络架构,它包括一个非常深的主网络分支和一个配套的特征融合网络分支,用于融合从主分支计算的多尺度特征。通过将嘈杂的用户提供的标签作为模型输入,深度模型也具有多模态性,以补充图像输入。为了解决第二个问题,我们在主要标签预测任务中引入了标签数量预测辅助任务,以显式估计给定图像的最佳标签数量。在两个大规模图像标注基准数据集上进行了广泛的实验,结果表明我们的方法明显优于现有技术。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验