基于全卷积网络的视频显著目标检测

Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China.

School of Computing Sciences, University of East Anglia, Norwich, U.K.

IEEE Trans Image Process. 2018;27(1):38-49. doi: 10.1109/TIP.2017.2754941.

This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).

本文提出了一种深度学习模型，用于有效地检测视频中的显著区域。它解决了两个重要问题：1）在缺乏足够大和像素级注释视频数据的情况下，对深度视频显着性模型进行训练；2）快速进行视频显着性训练和检测。所提出的深度视频显着性网络由两个模块组成，分别用于捕获空间和时间显着性信息。动态显着性模型明确地从静态显着性模型中提取显着性估计值，直接产生时空显着性推断，而无需进行耗时的光流计算。我们进一步提出了一种新颖的数据增强技术，该技术可以从现有的注释图像数据集模拟视频训练数据，这使我们的网络能够学习到不同的显着性信息，并防止因训练视频数量有限而导致的过拟合。利用我们的合成视频数据（150K 个视频序列）和真实视频，我们的深度视频显着性模型成功地学习了空间和时间显着性线索，从而产生了准确的时空显着性估计。我们在密集注释的视频分割数据集（MAE 为.06）和弗莱堡-伯克利运动分割数据集（MAE 为.07）上取得了最新的成果，并且速度大大提高（所有步骤的帧率为 2fps）。

相似文献

Video Salient Object Detection via Fully Convolutional Networks.

IEEE Trans Image Process. 2018;27(1):38-49. doi: 10.1109/TIP.2017.2754941.

Video Saliency Detection Using Object Proposals.

IEEE Trans Cybern. 2018 Nov;48(11):3159-3170. doi: 10.1109/TCYB.2017.2761361. Epub 2017 Oct 25.

A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection.

IEEE Trans Image Process. 2018 Jan;27(1):349-364. doi: 10.1109/TIP.2017.2762594. Epub 2017 Oct 12.

Video Salient Object Detection Using Spatiotemporal Deep Features.

IEEE Trans Image Process. 2018 Oct;27(10):5002-5015. doi: 10.1109/TIP.2018.2849860.

Revisiting Video Saliency Prediction in the Deep Learning Era.

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):220-237. doi: 10.1109/TPAMI.2019.2924417. Epub 2020 Dec 4.

Unsupervised Uncertainty Estimation Using Spatiotemporal Cues in Video Saliency Detection.

IEEE Trans Image Process. 2018 Jun;27(6):2818-2827. doi: 10.1109/TIP.2018.2813159.

Salient object detection with spatiotemporal background priors for video.

IEEE Trans Image Process. 2017 Jul;26(7):3425-3436. doi: 10.1109/TIP.2016.2631900. Epub 2016 Nov 22.

SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection.

IEEE Trans Cybern. 2019 Aug;49(8):2900-2911. doi: 10.1109/TCYB.2018.2832053. Epub 2018 May 25.

SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection.

IEEE Trans Image Process. 2018 Jul;27(7):3345-3357. doi: 10.1109/TIP.2018.2813165.

Salient Object Detection with Recurrent Fully Convolutional Networks.

IEEE Trans Pattern Anal Mach Intell. 2019 Jul;41(7):1734-1746. doi: 10.1109/TPAMI.2018.2846598. Epub 2018 Jun 12.

引用本文的文献

EFCRFNet: A novel multi-scale framework for salient object detection.

PLoS One. 2025 May 22;20(5):e0323757. doi: 10.1371/journal.pone.0323757. eCollection 2025.

DeSa COVID-19: Deep salient COVID-19 image-based quality assessment.

J King Saud Univ Comput Inf Sci. 2022 Nov;34(10):9501-9512. doi: 10.1016/j.jksuci.2021.11.013. Epub 2021 Dec 6.

Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer.

Sensors (Basel). 2024 Apr 8;24(7):2374. doi: 10.3390/s24072374.

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond.

Int J Comput Vis. 2024;132(3):854-871. doi: 10.1007/s11263-023-01879-7. Epub 2023 Oct 18.

Multi-Scale Feature Fusion Convolutional Neural Network for Indoor Small Target Detection.

Front Neurorobot. 2022 May 19;16:881021. doi: 10.3389/fnbot.2022.881021. eCollection 2022.

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.

Micromachines (Basel). 2021 Dec 31;13(1):72. doi: 10.3390/mi13010072.

CariesNet: a deep learning approach for segmentation of multi-stage caries lesion from oral panoramic X-ray image.

Neural Comput Appl. 2022 Jan 7:1-9. doi: 10.1007/s00521-021-06684-2.

A novel fully convolutional network for visual saliency prediction.

PeerJ Comput Sci. 2020 Jul 13;6:e280. doi: 10.7717/peerj-cs.280. eCollection 2020.

RGB-D salient object detection: A survey.

Comput Vis Media (Beijing). 2021;7(1):37-69. doi: 10.1007/s41095-020-0199-z. Epub 2021 Jan 7.

Multi-Scale Global Contrast CNN for Salient Object Detection.

Sensors (Basel). 2020 May 6;20(9):2656. doi: 10.3390/s20092656.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Video Salient Object Detection via Fully Convolutional Networks.

IEEE Trans Image Process. 2018;27(1):38-49. doi: 10.1109/TIP.2017.2754941.

Video Saliency Detection Using Object Proposals.

IEEE Trans Cybern. 2018 Nov;48(11):3159-3170. doi: 10.1109/TCYB.2017.2761361. Epub 2017 Oct 25.

A Benchmark Dataset and Saliency-Guided Stacked Autoencoders for Video-Based Salient Object Detection.

IEEE Trans Image Process. 2018 Jan;27(1):349-364. doi: 10.1109/TIP.2017.2762594. Epub 2017 Oct 12.

Video Salient Object Detection Using Spatiotemporal Deep Features.

IEEE Trans Image Process. 2018 Oct;27(10):5002-5015. doi: 10.1109/TIP.2018.2849860.

Revisiting Video Saliency Prediction in the Deep Learning Era.

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):220-237. doi: 10.1109/TPAMI.2019.2924417. Epub 2020 Dec 4.

Unsupervised Uncertainty Estimation Using Spatiotemporal Cues in Video Saliency Detection.

IEEE Trans Image Process. 2018 Jun;27(6):2818-2827. doi: 10.1109/TIP.2018.2813159.

Salient object detection with spatiotemporal background priors for video.

IEEE Trans Image Process. 2017 Jul;26(7):3425-3436. doi: 10.1109/TIP.2016.2631900. Epub 2016 Nov 22.

SG-FCN: A Motion and Memory-Based Deep Learning Model for Video Saliency Detection.

IEEE Trans Cybern. 2019 Aug;49(8):2900-2911. doi: 10.1109/TCYB.2018.2832053. Epub 2018 May 25.

SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection.

IEEE Trans Image Process. 2018 Jul;27(7):3345-3357. doi: 10.1109/TIP.2018.2813165.

Salient Object Detection with Recurrent Fully Convolutional Networks.

IEEE Trans Pattern Anal Mach Intell. 2019 Jul;41(7):1734-1746. doi: 10.1109/TPAMI.2018.2846598. Epub 2018 Jun 12.

引用本文的文献

EFCRFNet: A novel multi-scale framework for salient object detection.

PLoS One. 2025 May 22;20(5):e0323757. doi: 10.1371/journal.pone.0323757. eCollection 2025.

DeSa COVID-19: Deep salient COVID-19 image-based quality assessment.

J King Saud Univ Comput Inf Sci. 2022 Nov;34(10):9501-9512. doi: 10.1016/j.jksuci.2021.11.013. Epub 2021 Dec 6.

Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer.

Sensors (Basel). 2024 Apr 8;24(7):2374. doi: 10.3390/s24072374.

In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond.

Int J Comput Vis. 2024;132(3):854-871. doi: 10.1007/s11263-023-01879-7. Epub 2023 Oct 18.

Multi-Scale Feature Fusion Convolutional Neural Network for Indoor Small Target Detection.

Front Neurorobot. 2022 May 19;16:881021. doi: 10.3389/fnbot.2022.881021. eCollection 2022.

Visual Feature Learning on Video Object and Human Action Detection: A Systematic Review.

Micromachines (Basel). 2021 Dec 31;13(1):72. doi: 10.3390/mi13010072.

CariesNet: a deep learning approach for segmentation of multi-stage caries lesion from oral panoramic X-ray image.

Neural Comput Appl. 2022 Jan 7:1-9. doi: 10.1007/s00521-021-06684-2.

A novel fully convolutional network for visual saliency prediction.

PeerJ Comput Sci. 2020 Jul 13;6:e280. doi: 10.7717/peerj-cs.280. eCollection 2020.

RGB-D salient object detection: A survey.

Comput Vis Media (Beijing). 2021;7(1):37-69. doi: 10.1007/s41095-020-0199-z. Epub 2021 Jan 7.

Multi-Scale Global Contrast CNN for Salient Object Detection.

Sensors (Basel). 2020 May 6;20(9):2656. doi: 10.3390/s20092656.

Video Salient Object Detection via Fully Convolutional Networks.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献