Liu Zhi-Song, Cani Marie-Paule, Siu Wan-Chi
IEEE Trans Image Process. 2022;31:1857-1869. doi: 10.1109/TIP.2022.3148819. Epub 2022 Feb 16.
We present See360, which is a versatile and efficient framework for 360° panoramic view interpolation using latent space viewpoint estimation. Most of the existing view rendering approaches only focus on indoor or synthetic 3D environments and render new views of small objects. In contrast, we suggest to tackle camera-centered view synthesis as a 2D affine transformation without using point clouds or depth maps, which enables an effective 360° panoramic scene exploration. Given a pair of reference images, the See360 model learns to render novel views by a proposed novel Multi-Scale Affine Transformer (MSAT), enabling the coarse-to-fine feature rendering. We also propose a Conditional Latent space AutoEncoder (C-LAE) to achieve view interpolation at any arbitrary angle. To show the versatility of our method, we introduce four training datasets, namely UrbanCity360, Archinterior360, HungHom360 and Lab360, which are collected from indoor and outdoor environments for both real and synthetic rendering. Experimental results show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for all four datasets. In addition, our See360 model can be applied to view synthesis in the wild: with only a short extra training time (approximately 10 mins), and is able to render unknown real-world scenes. The superior performance of See360 opens up a promising direction for camera-centered view rendering and 360° panoramic view interpolation.
我们提出了See360,这是一个使用潜在空间视点估计进行360°全景视图插值的通用且高效的框架。现有的大多数视图渲染方法仅专注于室内或合成3D环境,并渲染小物体的新视图。相比之下,我们建议将以相机为中心的视图合成作为一种二维仿射变换来处理,而不使用点云或深度图,这使得能够有效地探索360°全景场景。给定一对参考图像,See360模型通过提出的新颖的多尺度仿射变换器(MSAT)学习渲染新颖视图,实现从粗到细的特征渲染。我们还提出了一种条件潜在空间自动编码器(C-LAE),以在任意角度实现视图插值。为了展示我们方法的通用性,我们引入了四个训练数据集,即UrbanCity360、Archinterior360、红磡360和实验室360,它们是从室内和室外环境中收集的,用于真实和合成渲染。实验结果表明,所提出的方法具有足够的通用性,能够对所有四个数据集实现任意视图的实时渲染。此外,我们的See360模型可以应用于野外的视图合成:只需额外短时间的训练(约10分钟),并且能够渲染未知的真实世界场景。See360的卓越性能为以相机为中心的视图渲染和360°全景视图插值开辟了一个有前景的方向。