Wei Lina, Zhao Shanshan, Bourahla Omar Farouk, Li Xi, Wu Fei, Zhuang Yueting, Han Junwei, Xu Mingliang
IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1691-1702. doi: 10.1109/TNNLS.2020.2986823. Epub 2021 Apr 2.
As an interesting and important problem in computer vision, learning-based video saliency detection aims to discover the visually interesting regions in a video sequence. Capturing the information within frame and between frame at different aspects (such as spatial contexts, motion information, temporal consistency across frames, and multiscale representation) is important for this task. A key issue is how to jointly model all these factors within a unified data-driven scheme in an end-to-end fashion. In this article, we propose an end-to-end spatiotemporal deep video saliency detection approach, which captures the information on spatial contexts and motion characteristics. Furthermore, it encodes the temporal consistency information across the consecutive frames by implementing a convolutional long short-term memory (Conv-LSTM) model. In addition, the multiscale saliency properties for each frame are adaptively integrated for final saliency prediction in a collaborative feature-pyramid way. Finally, the proposed deep learning approach unifies all the aforementioned parts into an end-to-end joint deep learning scheme. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.
作为计算机视觉中一个有趣且重要的问题,基于学习的视频显著性检测旨在发现视频序列中视觉上引人关注的区域。从不同方面(如空间上下文、运动信息、帧间的时间一致性以及多尺度表示)捕捉帧内和帧间的信息对于此任务至关重要。一个关键问题是如何以端到端的方式在统一的数据驱动方案中对所有这些因素进行联合建模。在本文中,我们提出了一种端到端的时空深度视频显著性检测方法,该方法捕捉空间上下文和运动特征方面的信息。此外,它通过实现卷积长短期记忆(Conv-LSTM)模型对连续帧之间的时间一致性信息进行编码。另外,以协作特征金字塔的方式自适应地整合每一帧的多尺度显著性属性以进行最终的显著性预测。最后,所提出的深度学习方法将上述所有部分统一为一个端到端的联合深度学习方案。实验结果表明,与现有最先进的方法相比,我们的方法是有效的。