Luo Yao, Pan Jinshan, Tang Jinhui
IEEE Trans Image Process. 2022;31:6773-6788. doi: 10.1109/TIP.2022.3215911. Epub 2022 Oct 28.
Recent video frame interpolation methods have employed the curvilinear motion model to accommodate nonlinear motion among frames. The effectiveness of such model often hinges on motion estimation and occlusion detection, and therefore is greatly challenged when these methods are used to handle dynamic scenes that contain complex motions and occlusions. We address the challenges by proposing a bi-directional pseudo-three-dimensional network to exploit the correlation between motion estimation and depth-related occlusion estimation that considers the third dimension: depth. Specifically, the network exploits the correlation by learning shared multi-scale spatiotemporal representations, and by coupling the estimations, in both the past and future directions, to synthesize intermediate frames through a bi-directional pseudo-three-dimensional warping layer, where adaptive convolution kernels are estimated progressively from the coalescence of motion and depth-related occlusion estimations across multiple scales to acquire nonlocal and adaptive neighborhoods. The proposed network utilizes a novel multi-task collaborative learning strategy, which facilitates the supervised learning of video frame interpolation using complementary self-supervisory signals from motion and depth-related occlusion estimations. Across various benchmark datasets, the proposed method outperforms state-of-the-art methods in terms of accuracy, model size and runtime performance.
最近的视频帧插值方法采用了曲线运动模型来适应帧之间的非线性运动。这种模型的有效性通常取决于运动估计和遮挡检测,因此当这些方法用于处理包含复杂运动和遮挡的动态场景时,会受到极大挑战。我们通过提出一种双向伪三维网络来应对这些挑战,该网络利用运动估计和与深度相关的遮挡估计之间的相关性,其中考虑了第三维度:深度。具体而言,该网络通过学习共享的多尺度时空表示,并通过在过去和未来方向上耦合估计,通过双向伪三维扭曲层来合成中间帧,其中自适应卷积核从跨多个尺度的运动和与深度相关的遮挡估计的合并中逐步估计,以获取非局部和自适应邻域。所提出的网络采用了一种新颖的多任务协作学习策略,该策略利用来自运动和与深度相关的遮挡估计的互补自监督信号,促进视频帧插值的监督学习。在各种基准数据集上,所提出的方法在准确性、模型大小和运行时性能方面均优于现有方法。