Zhou Yang, Xu Yangyang, Du Yong, Wen Qiang, He Shengfeng
IEEE Trans Image Process. 2022;31:1230-1242. doi: 10.1109/TIP.2022.3140603. Epub 2022 Jan 19.
The state-of-the-art photo upsampling method, PULSE, demonstrates that a sharp, high-resolution (HR) version of a given low-resolution (LR) input can be obtained by exploring the latent space of generative models. However, mapping an extreme LR input (16) directly to an HR image (1024) is too ambiguous to preserve faithful local facial semantics. In this paper, we propose an enhanced upsampling approach, Pro-PULSE, that addresses the issues of semantic inconsistency and optimization complexity. Our idea is to learn an encoder that progressively constructs the HR latent codes in the extended W+ latent space of StyleGAN. This design divides the complex 64× upsampling problem into several steps, and therefore small-scale facial semantics can be inherited from one end to the other. In particular, we train two encoders, the base encoder maps latent vectors in W space and serves as a foundation of the HR latent vector, while the second scale-specific encoder performed in W+ space gradually replaces the previous vector produced by the base encoder at each scale. This process produces intermediate side-outputs, which injects deep supervision into the training of encoder. Extensive experiments demonstrate superiorities over the latest latent space exploration methods, in terms of efficiency, quantitative quality metrics, and qualitative visual results.
最先进的照片超分辨率方法PULSE表明,通过探索生成模型的潜在空间,可以获得给定低分辨率(LR)输入的清晰、高分辨率(HR)版本。然而,将极低分辨率输入(16×16)直接映射到高分辨率图像(1024×1024)过于模糊,无法保留忠实的局部面部语义。在本文中,我们提出了一种增强的超分辨率方法Pro-PULSE,该方法解决了语义不一致和优化复杂性的问题。我们的想法是学习一个编码器,该编码器在StyleGAN的扩展W+潜在空间中逐步构建高分辨率潜在代码。这种设计将复杂的64倍超分辨率问题分解为几个步骤,因此小规模的面部语义可以从一端继承到另一端。具体来说,我们训练了两个编码器,基础编码器映射W空间中的潜在向量,并作为高分辨率潜在向量的基础,而在W+空间中执行的第二个特定尺度编码器在每个尺度上逐渐替换由基础编码器产生的先前向量。这个过程产生中间侧输出,将深度监督注入到编码器的训练中。大量实验表明,在效率、定量质量指标和定性视觉结果方面,Pro-PULSE优于最新的潜在空间探索方法。