Zhang Yu-Pei, Chan Kwok-Leung
Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China.
Sensors (Basel). 2021 Dec 15;21(24):8374. doi: 10.3390/s21248374.
Detecting saliency in videos is a fundamental step in many computer vision systems. Saliency is the significant target(s) in the video. The object of interest is further analyzed for high-level applications. The segregation of saliency and the background can be made if they exhibit different visual cues. Therefore, saliency detection is often formulated as background subtraction. However, saliency detection is challenging. For instance, dynamic background can result in false positive errors. In another scenario, camouflage will result in false negative errors. With moving cameras, the captured scenes are even more complicated to handle. We propose a new framework, called saliency detection via background model completion (SD-BMC), that comprises a background modeler and a deep learning background/foreground segmentation network. The background modeler generates an initial clean background image from a short image sequence. Based on the idea of video completion, a good background frame can be synthesized with the co-existence of changing background and moving objects. We adopt the background/foreground segmenter, which was pre-trained with a specific video dataset. It can also detect saliency in unseen videos. The background modeler can adjust the background image dynamically when the background/foreground segmenter output deteriorates during processing a long video. To the best of our knowledge, our framework is the first one to adopt video completion for background modeling and saliency detection in videos captured by moving cameras. The F-measure results, obtained from the pan-tilt-zoom (PTZ) videos, show that our proposed framework outperforms some deep learning-based background subtraction models by 11% or more. With more challenging videos, our framework also outperforms many high-ranking background subtraction methods by more than 3%.
检测视频中的显著区域是许多计算机视觉系统的基本步骤。显著区域是视频中的重要目标。为了进行高级应用,需要进一步分析感兴趣的对象。如果显著区域和背景呈现出不同的视觉线索,就可以将它们区分开来。因此,显著区域检测通常被表述为背景减法。然而,显著区域检测具有挑战性。例如,动态背景可能导致误报错误。在另一种情况下,伪装会导致漏报错误。对于移动摄像头,捕获的场景更难处理。我们提出了一种新的框架,称为通过背景模型完成的显著区域检测(SD-BMC),它包括一个背景建模器和一个深度学习背景/前景分割网络。背景建模器从一个短图像序列生成初始的干净背景图像。基于视频完成的思想,可以在变化的背景和移动物体共存的情况下合成一个好的背景帧。我们采用经过特定视频数据集预训练的背景/前景分割器。它也可以在未见过的视频中检测显著区域。当背景/前景分割器在处理长视频时输出质量下降时,背景建模器可以动态调整背景图像。据我们所知,我们的框架是第一个在移动摄像头捕获的视频中采用视频完成进行背景建模和显著区域检测的框架。从云台变焦(PTZ)视频获得的F值结果表明,我们提出的框架比一些基于深度学习的背景减法模型性能高出11%或更多。对于更具挑战性的视频,我们的框架也比许多高级背景减法方法性能高出3%以上。