• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于RGB-D目标标注的多模态、判别式且空间不变的卷积神经网络。

A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling.

作者信息

Asif Umar, Bennamoun Mohammed, Sohel Ferdous A

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 Sep;40(9):2051-2065. doi: 10.1109/TPAMI.2017.2747134. Epub 2017 Aug 30.

DOI:10.1109/TPAMI.2017.2747134
PMID:28866483
Abstract

While deep convolutional neural networks have shown a remarkable success in image classification, the problems of inter-class similarities, intra-class variances, the effective combination of multi-modal data, and the spatial variability in images of objects remain to be major challenges. To address these problems, this paper proposes a novel framework to learn a discriminative and spatially invariant classification model for object and indoor scene recognition using multi-modal RGB-D imagery. This is achieved through three postulates: 1) spatial invariance $-$ this is achieved by combining a spatial transformer network with a deep convolutional neural network to learn features which are invariant to spatial translations, rotations, and scale changes, 2) high discriminative capability $-$ this is achieved by introducing Fisher encoding within the CNN architecture to learn features which have small inter-class similarities and large intra-class compactness, and 3) multi-modal hierarchical fusion$-$ this is achieved through the regularization of semantic segmentation to a multi-modal CNN architecture, where class probabilities are estimated at different hierarchical levels (i.e., image- and pixel-levels), and fused into a Conditional Random Field (CRF)-based inference hypothesis, the optimization of which produces consistent class labels in RGB-D images. Extensive experimental evaluations on RGB-D object and scene datasets, and live video streams (acquired from Kinect) show that our framework produces superior object and scene classification results compared to the state-of-the-art methods.

摘要

虽然深度卷积神经网络在图像分类方面取得了显著成功,但类间相似性、类内方差、多模态数据的有效组合以及物体图像中的空间变异性等问题仍然是主要挑战。为了解决这些问题,本文提出了一种新颖的框架,用于使用多模态RGB-D图像学习用于物体和室内场景识别的判别性和空间不变分类模型。这通过三个假设来实现:1)空间不变性——这是通过将空间变换网络与深度卷积神经网络相结合来学习对空间平移、旋转和尺度变化不变的特征来实现的;2)高判别能力——这是通过在卷积神经网络架构中引入Fisher编码来学习具有小类间相似性和大类内紧凑性的特征来实现的;3)多模态层次融合——这是通过将语义分割正则化到多模态卷积神经网络架构来实现的,其中在不同层次级别(即图像和像素级别)估计类概率,并融合到基于条件随机场(CRF)的推理假设中,对其进行优化可在RGB-D图像中产生一致的类标签。对RGB-D物体和场景数据集以及实时视频流(从Kinect获取)进行的广泛实验评估表明,与现有方法相比,我们的框架产生了更优的物体和场景分类结果。

相似文献

1
A Multi-Modal, Discriminative and Spatially Invariant CNN for RGB-D Object Labeling.一种用于RGB-D目标标注的多模态、判别式且空间不变的卷积神经网络。
IEEE Trans Pattern Anal Mach Intell. 2018 Sep;40(9):2051-2065. doi: 10.1109/TPAMI.2017.2747134. Epub 2017 Aug 30.
2
Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection.边缘保持和多尺度上下文神经网络的显著目标检测。
IEEE Trans Image Process. 2018;27(1):121-134. doi: 10.1109/TIP.2017.2756825.
3
RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory.基于多模态深度神经网络和证据理论的 RGB-D 目标识别。
Sensors (Basel). 2019 Jan 27;19(3):529. doi: 10.3390/s19030529.
4
ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition.问:为RGB-D场景识别自适应选择关键局部特征。
IEEE Trans Image Process. 2021;30:2722-2733. doi: 10.1109/TIP.2021.3053459. Epub 2021 Feb 10.
5
Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection.学习旋转不变和 Fisher 判别卷积神经网络进行目标检测。
IEEE Trans Image Process. 2019 Jan;28(1):265-278. doi: 10.1109/TIP.2018.2867198.
6
Uniform and Variational Deep Learning for RGB-D Object Recognition and Person Re-Identification.用于RGB-D目标识别和行人重识别的统一与变分深度学习
IEEE Trans Image Process. 2019 Oct;28(10):4970-4983. doi: 10.1109/TIP.2019.2915655. Epub 2019 May 15.
7
Organ Segmentation in Poultry Viscera Using RGB-D.基于RGB-D的家禽内脏器官分割
Sensors (Basel). 2018 Jan 3;18(1):117. doi: 10.3390/s18010117.
8
Three-stream Attention-aware Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的三流注意力感知网络
IEEE Trans Image Process. 2019 Jan 7. doi: 10.1109/TIP.2019.2891104.
9
TransMed: Transformers Advance Multi-Modal Medical Image Classification.跨模态医学图像分类:Transformer推进多模态医学图像分类
Diagnostics (Basel). 2021 Jul 31;11(8):1384. doi: 10.3390/diagnostics11081384.
10
RGB-D based multi-modal deep learning for spacecraft and debris recognition.基于RGB-D的多模态深度学习用于航天器与碎片识别。
Sci Rep. 2022 Mar 10;12(1):3924. doi: 10.1038/s41598-022-07846-5.

引用本文的文献

1
Performance Evaluation of Deep Learning Image Classification Modules in the MUN-ABSAI Ice Risk Management Architecture.MUN-ABSAI冰风险管理架构中深度学习图像分类模块的性能评估
Sensors (Basel). 2025 Jan 8;25(2):326. doi: 10.3390/s25020326.
2
A comparative analysis of different augmentations for brain images.不同脑图像增强方法的比较分析。
Med Biol Eng Comput. 2024 Oct;62(10):3123-3150. doi: 10.1007/s11517-024-03127-7. Epub 2024 May 24.
3
Data Augmentation for Brain-Tumor Segmentation: A Review.用于脑肿瘤分割的数据增强:综述
Front Comput Neurosci. 2019 Dec 11;13:83. doi: 10.3389/fncom.2019.00083. eCollection 2019.