Yang Yixin, Pan Jinshan, Peng Zhongzheng, Du Xiaoyu, Tao Zhulin, Tang Jinhui
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5612-5624. doi: 10.1109/TPAMI.2024.3370920. Epub 2024 Jul 2.
How to effectively explore the colors of exemplars and propagate them to colorize each frame is vital for exemplar-based video colorization. In this article, we present a BiSTNet to explore colors of exemplars and utilize them to help video colorization by a bidirectional temporal feature fusion with the guidance of semantic image prior. We first establish the semantic correspondence between each frame and the exemplars in deep feature space to explore color information from exemplars. Then, we develop a simple yet effective bidirectional temporal feature fusion module to propagate the colors of exemplars into each frame and avoid inaccurate alignment. We note that there usually exist color-bleeding artifacts around the boundaries of important objects in videos. To overcome this problem, we develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process. In addition, we develop a multi-scale refinement block to progressively colorize frames in a coarse-to-fine manner. Extensive experimental results demonstrate that the proposed BiSTNet performs favorably against state-of-the-art methods on the benchmark datasets and real-world scenes. Moreover, the BiSTNet obtains one champion in NTIRE 2023 video colorization challenge (Kang et al. 2023).
如何有效地探索样本的颜色并将其传播以对每个帧进行着色,对于基于样本的视频着色至关重要。在本文中,我们提出了一种双向时空网络(BiSTNet)来探索样本的颜色,并在语义图像先验的指导下通过双向时空特征融合利用这些颜色来帮助视频着色。我们首先在深度特征空间中建立每个帧与样本之间的语义对应关系,以从样本中探索颜色信息。然后,我们开发了一个简单而有效的双向时空特征融合模块,将样本的颜色传播到每个帧中,并避免不准确的对齐。我们注意到,视频中重要物体的边界周围通常存在颜色渗漏伪影。为了克服这个问题,我们开发了一个混合专家模块来提取语义信息,以对帧的物体边界进行建模,从而使语义图像先验能够更好地指导着色过程。此外,我们开发了一个多尺度细化模块,以从粗到细的方式逐步对帧进行着色。大量实验结果表明,所提出的BiSTNet在基准数据集和真实场景上优于现有方法。此外,BiSTNet在NTIRE 2023视频着色挑战赛中获得了一个冠军(Kang等人,2023年)。