IEEE Trans Image Process. 2022;31:5067-5078. doi: 10.1109/TIP.2022.3192717. Epub 2022 Aug 2.
We propose a vision-based framework for dynamic sky replacement and harmonization in videos. Different from previous sky editing methods that either focus on static photos or require real-time pose signal from the camera's inertial measurement units, our method is purely vision-based, without any requirements on the capturing devices, and can be well applied to either online or offline processing scenarios. Our method runs in real-time and is free of manual interactions. We decompose the video sky replacement into several proxy tasks, including motion estimation, sky matting, and image blending. We derive the motion equation of an object at infinity on the image plane under the camera's motion, and propose "flow propagation", a novel method for robust motion estimation. We also propose a coarse-to-fine sky matting network to predict accurate sky matte and design image blending to improve the harmonization. Experiments are conducted on videos diversely captured in the wild and show high fidelity and good generalization capability of our framework in both visual quality and lighting/motion dynamics. We also introduce a new method for content-aware image augmentation and proved that this method is beneficial to visual perception in autonomous driving scenarios. Our code and animated results are available at https://github.com/jiupinjia/SkyAR.
我们提出了一种基于视觉的视频动态天空替换和协调框架。与之前专注于静态照片或需要来自相机惯性测量单元的实时姿态信号的天空编辑方法不同,我们的方法完全基于视觉,不依赖于任何捕获设备,并且可以很好地应用于在线或离线处理场景。我们的方法实时运行,无需人工交互。我们将视频天空替换分解为几个代理任务,包括运动估计、天空遮罩和图像混合。我们推导出在相机运动下无穷远处物体在图像平面上的运动方程,并提出了“流传播”,这是一种用于鲁棒运动估计的新方法。我们还提出了一种从粗到精的天空遮罩网络来预测准确的天空遮罩,并设计图像混合来提高协调效果。我们在野外多样化拍摄的视频上进行了实验,结果表明,我们的框架在视觉质量和光照/运动动态方面都具有很高的逼真度和良好的泛化能力。我们还介绍了一种新的内容感知图像增强方法,并证明了该方法在自动驾驶场景中的视觉感知中是有益的。我们的代码和动画结果可在 https://github.com/jiupinjia/SkyAR 上获得。