Xiao Zeyu, Cheng Zhen, Xiong Zhiwei
IEEE Trans Image Process. 2023;32:4785-4799. doi: 10.1109/TIP.2023.3300121. Epub 2023 Aug 25.
Light field (LF) cameras suffer from a fundamental trade-off between spatial and angular resolutions. Additionally, due to the significant amount of data that needs to be recorded, the Lytro ILLUM, a modern LF camera, can only capture three frames per second. In this paper, we consider space-time super-resolution (SR) for LF videos, aiming at generating high-resolution and high-frame-rate LF videos from low-resolution and low-frame-rate observations. Extending existing space-time video SR methods to this task directly will meet two key challenges: 1) how to re-organize sub-aperture images (SAIs) efficiently and effectively given highly redundant LF videos, and 2) how to aggregate complementary information between multiple SAIs and frames considering the coherence in LF videos. To address the above challenges, we propose a novel framework for space-time super-resolving LF videos for the first time. First, we propose a novel Multi-Scale Dilated SAI Re-organization strategy for re-organizing SAIs into auxiliary view stacks with decreasing resolution as the Chebyshev distance in the angular dimension increases. In particular, the auxiliary view stack with original resolution preserves essential visual details, while the down-scaled view stacks capture long-range contextual information. Second, we propose the Multi-Scale Aggregated Feature extractor and the Angular-Assisted Feature Interpolation module to utilize and aggregate information from the spatial, angular, and temporal dimensions in LF videos. The former aggregates similar contents from different SAIs and frames for subsequent reconstruction in a disparity-free manner at the feature level, whereas the latter interpolates intermediate frames temporally by implicitly aggregating geometric information. Compared to other potential approaches, experimental results demonstrate that the reconstructed LF videos generated by our framework achieve higher reconstruction quality and better preserve the LF parallax structure and temporal consistency. The implementation code is available at https://github.com/zeyuxiao1997/LFSTVSR.
光场(LF)相机在空间分辨率和角度分辨率之间存在着基本的权衡。此外,由于需要记录大量数据,现代LF相机Lytro ILLUM每秒只能捕捉三帧。在本文中,我们考虑用于LF视频的时空超分辨率(SR),旨在从低分辨率和低帧率的观测中生成高分辨率和高帧率的LF视频。将现有的时空视频SR方法直接扩展到这项任务将面临两个关键挑战:1)在高度冗余的LF视频的情况下,如何高效且有效地重新组织子孔径图像(SAI);2)考虑到LF视频中的连贯性,如何聚合多个SAI和帧之间的互补信息。为了应对上述挑战,我们首次提出了一种用于时空超分辨率LF视频的新颖框架。首先,我们提出了一种新颖的多尺度扩张SAI重组策略,用于将SAI重新组织成辅助视图堆栈,随着角度维度上切比雪夫距离的增加,分辨率逐渐降低。特别地,具有原始分辨率的辅助视图堆栈保留了基本的视觉细节,而缩小后的视图堆栈捕捉了远距离的上下文信息。其次,我们提出了多尺度聚合特征提取器和角度辅助特征插值模块,以利用和聚合LF视频中空间、角度和时间维度的信息。前者以无视差的方式在特征级别聚合来自不同SAI和帧的相似内容,以便后续重建,而后者通过隐式聚合几何信息在时间上插值中间帧。与其他潜在方法相比,实验结果表明,我们的框架生成的重建LF视频实现了更高的重建质量,并更好地保留了LF视差结构和时间一致性。实现代码可在https://github.com/zeyuxiao1997/LFSTVSR获取。