Zhang Jin, Zhang Xi, Zhang Yanyan, Duan Yexin, Li Yang, Pan Zhisong
IEEE Trans Image Process. 2021;30:9058-9068. doi: 10.1109/TIP.2021.3122102. Epub 2021 Nov 2.
Background subtraction is a classic video processing task pervading in numerous visual applications such as video surveillance and traffic monitoring. Given the diversity and variability of real application scenes, an ideal background subtraction model should be robust to various scenarios. Even though deep-learning approaches have demonstrated unprecedented improvements, they often fail to generalize to unseen scenarios, thereby less suitable for extensive deployment. In this work, we propose to tackle cross-scene background subtraction via a two-phase framework that includes meta-knowledge learning and domain adaptation. Specifically, as we observe that meta-knowledge (i.e., scene-independent common knowledge) is the cornerstone for generalizing to unseen scenes, we draw on traditional frame differencing algorithms and design a deep difference network (DDN) to encode meta-knowledge especially temporal change knowledge from various cross-scene data (source domain) without intermittent foreground motion pattern. In addition, we explore a self-training domain adaptation strategy based on iterative evolution. With iteratively updated pseudo-labels, the DDN is continuously fine-tuned and evolves progressively toward unseen scenes (target domain) in an unsupervised fashion. Our framework could be easily deployed on unseen scenes without relying on their annotations. As evidenced by our experiments on the CDnet2014 dataset, it brings a significant improvement to background subtraction. Our method has a favorable processing speed (70 fps) and outperforms the best unsupervised algorithm and top supervised algorithm designed for unseen scenes by 9% and 3%, respectively.
背景减除是一项经典的视频处理任务,广泛应用于众多视觉应用中,如视频监控和交通监测。鉴于实际应用场景的多样性和变异性,理想的背景减除模型应能在各种场景下保持稳健。尽管深度学习方法已展现出前所未有的改进,但它们往往无法推广到未见过的场景,因此不太适合广泛部署。在这项工作中,我们提出通过一个包含元知识学习和域适应的两阶段框架来解决跨场景背景减除问题。具体而言,我们观察到元知识(即与场景无关的常识)是推广到未见场景的基石,于是借鉴传统的帧差算法,设计了一个深度差分网络(DDN)来编码元知识,特别是从各种跨场景数据(源域)中获取的无间歇性前景运动模式的时间变化知识。此外,我们探索了一种基于迭代进化的自训练域适应策略。通过迭代更新伪标签,DDN以无监督的方式不断进行微调,并逐步向未见场景(目标域)进化。我们的框架无需依赖未见场景的标注即可轻松部署。我们在CDnet2014数据集上的实验表明,它为背景减除带来了显著的改进。我们的方法具有良好的处理速度(70帧/秒),分别比为未见场景设计的最佳无监督算法和顶级监督算法性能高出9%和3%。