用于RGB-D图像显著性预测的深度多模态融合自动编码器

Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images.

作者信息

Huang Kengda, Zhou Wujie, Fang Meixin

机构信息

School of Information and Electronic Engineering, Zhejiang University of Science & Technology, Hangzhou 310023, China.

Institute of Information and Communication Engineering, Zhejiang University, Hangzhou 310027, China.

出版信息

Comput Intell Neurosci. 2021 May 5;2021:6610997. doi: 10.1155/2021/6610997. eCollection 2021.

DOI:10.1155/2021/6610997

PMID:34035801

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8116150/

Abstract

In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin.

摘要

近年来，RGB-D图像中显著区域的预测已成为研究热点。与RGB图像相比，RGB-D图像的显著性预测更具挑战性。在本研究中，我们提出了一种用于RGB-D图像显著性预测的新型深度多模态融合自动编码器。RGB-D显著性预测模型的核心可训练自动编码器采用两种原始模态（RGB和深度/视差信息）作为输入，并将其相应的眼动注视属性作为标签。该自动编码器包括四个主要网络：颜色通道网络、视差通道网络、特征拼接网络和特征学习网络。该自动编码器可以挖掘复杂关系，并充分利用颜色和视差线索之间的互补特性。最后，通过一个特征组合子网预测显著性图，该子网结合了从先前学习和卷积特征学习子网中提取的深度特征。我们在两个公开可用的基准数据集上，将所提出的自动编码器与其他显著性预测模型进行了比较。结果表明，所提出的自动编码器显著优于这些模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c0e8/8116150/ab7f2cc279e8/CIN2021-6610997.001.jpg

相似文献

Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images.用于RGB-D图像显著性预测的深度多模态融合自动编码器

Comput Intell Neurosci. 2021 May 5;2021:6610997. doi: 10.1155/2021/6610997. eCollection 2021.

Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency.用于预测RGB-D显著图的分层多模态自适应融合（HMAF）网络

Comput Intell Neurosci. 2020 Nov 20;2020:8841681. doi: 10.1155/2020/8841681. eCollection 2020.

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images.利用未标记的 RGB 图像提升 RGB-D 显著度检测。

IEEE Trans Image Process. 2022;31:1107-1119. doi: 10.1109/TIP.2021.3139232. Epub 2022 Jan 12.

CDNet: Complementary Depth Network for RGB-D Salient Object Detection.CDNet：用于RGB-D显著目标检测的互补深度网络。

IEEE Trans Image Process. 2021;30:3376-3390. doi: 10.1109/TIP.2021.3060167. Epub 2021 Mar 9.

Attention-based fusion network for human eye-fixation prediction in 3D images.用于3D图像中人眼注视预测的基于注意力的融合网络。

Opt Express. 2019 Nov 11;27(23):34056-34066. doi: 10.1364/OE.27.034056.

ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection.ASIF-Net：用于 RGB-D 显著目标检测的注意力导向交织融合网络。

IEEE Trans Cybern. 2021 Jan;51(1):88-100. doi: 10.1109/TCYB.2020.2969255. Epub 2020 Dec 22.

DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection.DMRA：用于 RGB-D 显著度检测的深度诱导多尺度递归注意网络。

IEEE Trans Image Process. 2022;31:2321-2336. doi: 10.1109/TIP.2022.3154931. Epub 2022 Mar 11.

RGB-'D' Saliency Detection With Pseudo Depth.基于伪深度的 RGB-D 显著度检测。

IEEE Trans Image Process. 2019 May;28(5):2126-2139. doi: 10.1109/TIP.2018.2882156. Epub 2018 Nov 19.

CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion.基于卷积神经网络的跨视图迁移和多视图融合的 RGB-D 显著目标检测。

IEEE Trans Cybern. 2018 Nov;48(11):3171-3183. doi: 10.1109/TCYB.2017.2761775. Epub 2017 Oct 31.

Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection.学习用于RGB-D显著性检测的判别性跨模态特征。

IEEE Trans Image Process. 2022;31:1285-1297. doi: 10.1109/TIP.2022.3140606. Epub 2022 Jan 25.

引用本文的文献

Using Convolutional Neural Networks for the Assessment Research of Mental Health.使用卷积神经网络进行心理健康评估研究。

Comput Intell Neurosci. 2022 May 9;2022:1636855. doi: 10.1155/2022/1636855. eCollection 2022.

Dynamic Invariant-Specific Representation Fusion Network for Multimodal Sentiment Analysis.动态不变特定表示融合网络用于多模态情感分析。

Comput Intell Neurosci. 2022 Jan 24;2022:2105593. doi: 10.1155/2022/2105593. eCollection 2022.

本文引用的文献

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model.基于长短期记忆网络的显著性注意力模型预测人眼注视点

IEEE Trans Image Process. 2018 Jun 29. doi: 10.1109/TIP.2018.2851672.

Visual Saliency Prediction Using a Mixture of Deep Neural Networks.使用深度神经网络混合模型的视觉显著性预测

IEEE Trans Image Process. 2018 May 9. doi: 10.1109/TIP.2018.2834826.

Deep Visual Attention Prediction.深度视觉注意力预测。

IEEE Trans Image Process. 2018 May;27(5):2368-2378. doi: 10.1109/TIP.2017.2787612. Epub 2017 Dec 27.

A Deep Spatial Contextual Long-Term Recurrent Convolutional Network for Saliency Detection.基于深度空间上下文的显著性检测长短期记忆卷积网络。

IEEE Trans Image Process. 2018 Jul;27(7):3264-3274. doi: 10.1109/TIP.2018.2817047.

Reversion Correction and Regularized Random Walk Ranking for Saliency Detection.显著度检测的反转纠错和正则随机游走排序。

IEEE Trans Image Process. 2018 Mar;27(3):1311-1322. doi: 10.1109/TIP.2017.2762422. Epub 2017 Oct 12.

DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations.DeepFix：一种用于预测人眼注视点的全卷积神经网络。

IEEE Trans Image Process. 2017 Sep;26(9):4446-4456. doi: 10.1109/TIP.2017.2710620.

QoE-Guided Warping for Stereoscopic Image Retargeting.基于 QoE 的立体图像重定向变形成像。

IEEE Trans Image Process. 2017 Oct;26(10):4790-4805. doi: 10.1109/TIP.2017.2721546. Epub 2017 Jun 29.

Stereoscopic 3D Visual Discomfort Prediction: A Dynamic Accommodation and Vergence Interaction Model.立体 3D 视觉不适预测：一种动态调节和辐辏相互作用模型。

IEEE Trans Image Process. 2016 Feb;25(2):615-29. doi: 10.1109/TIP.2015.2506340. Epub 2015 Dec 7.

Learning-based saliency model with depth information.具有深度信息的基于学习的显著性模型。

J Vis. 2015;15(6):19. doi: 10.1167/15.6.19.

Two-Stage Learning to Predict Human Eye Fixations via SDAEs.基于 SDAEs 的两阶段学习预测人类眼动注视。

IEEE Trans Cybern. 2016 Feb;46(2):487-98. doi: 10.1109/TCYB.2015.2404432. Epub 2015 Feb 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于RGB-D图像显著性预测的深度多模态融合自动编码器

Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献