Shi Jinglei, Xu Yihong, Guillemot Christine
IEEE Trans Image Process. 2024;33:4060-4074. doi: 10.1109/TIP.2024.3418670. Epub 2024 Jul 4.
Light fields capture 3D scene information by recording light rays emitted from a scene at various orientations. They offer a more immersive perception, compared with classic 2D images, but at the cost of huge data volumes. In this paper, we design a compact neural network representation for the light field compression task. In the same vein as the deep image prior, the neural network takes randomly initialized noise as input and is trained in a supervised manner in order to best reconstruct the target light field Sub-Aperture Images (SAIs). The network is composed of two types of complementary kernels: descriptive kernels (descriptors) that store scene description information learned during training, and modulatory kernels (modulators) that control the rendering of different SAIs from the queried perspectives. To further enhance compactness of the network meanwhile retain high quality of the decoded light field, we propose modulator allocation and apply kernel tensor decomposition techniques, followed by non-uniform quantization and lossless entropy coding. Extensive experiments demonstrate that our method outperforms other state-of-the-art (SOTA) methods by a significant margin in the light field compression task. Moreover, after adapting descriptors, the modulators learned from one light field can be transferred to new light fields for rendering dense views, showing the potential of the solution for view synthesis.
光场通过记录从场景以各种方向发射的光线来捕获三维场景信息。与传统二维图像相比,它们提供了更身临其境的感知,但代价是数据量巨大。在本文中,我们为光场压缩任务设计了一种紧凑的神经网络表示。与深度图像先验类似,神经网络以随机初始化的噪声作为输入,并以监督方式进行训练,以便最佳地重建目标光场子孔径图像(SAI)。该网络由两种互补内核组成:描述性内核(描述符),用于存储训练期间学习到的场景描述信息;调制内核(调制器),用于从查询视角控制不同SAI的渲染。为了在保持解码光场高质量的同时进一步提高网络的紧凑性,我们提出了调制器分配并应用内核张量分解技术,随后进行非均匀量化和无损熵编码。大量实验表明,在光场压缩任务中,我们的方法比其他现有最先进(SOTA)方法有显著优势。此外,在调整描述符之后,从一个光场学习到的调制器可以转移到新的光场以渲染密集视图,这显示了该解决方案在视图合成方面的潜力。