Li Xiaoming, Zhang Shiguang, Zhou Shangchen, Zhang Lei, Zuo Wangmeng
IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):5904-5917. doi: 10.1109/TPAMI.2022.3215251. Epub 2023 Apr 3.
Blind face restoration is a challenging task due to the unknown, unsynthesizable and complex degradation, yet is valuable in many practical applications. To improve the performance of blind face restoration, recent works mainly treat the two aspects, i.e., generic and specific restoration, separately. In particular, generic restoration attempts to restore the results through general facial structure prior, while on the one hand, cannot generalize to real-world degraded observations due to the limited capability of direct CNNs' mappings in learning blind restoration, and on the other hand, fails to exploit the identity-specific details. On the contrary, specific restoration aims to incorporate the identity features from the reference of the same identity, in which the requirement of proper reference severely limits the application scenarios. Generally, it is a challenging and intractable task to improve the photo-realistic performance of blind restoration and adaptively handle the generic and specific restoration scenarios with a single unified model. Instead of implicitly learning the mapping from a low-quality image to its high-quality counterpart, this paper suggests a DMDNet by explicitly memorizing the generic and specific features through dual dictionaries. First, the generic dictionary learns the general facial priors from high-quality images of any identity, while the specific dictionary stores the identity-belonging features for each person individually. Second, to handle the degraded input with or without specific reference, dictionary transform module is suggested to read the relevant details from the dual dictionaries which are subsequently fused into the input features. Finally, multi-scale dictionaries are leveraged to benefit the coarse-to-fine restoration. The whole framework including the generic and specific dictionaries is optimized in an end-to-end manner and can be flexibly plugged into different application scenarios. Moreover, a new high-quality dataset, termed CelebRef-HQ, is constructed to promote the exploration of specific face restoration in the high-resolution space. Experimental results demonstrate that the proposed DMDNet performs favorably against the state of the arts in both quantitative and qualitative evaluation, and generates more photo-realistic results on the real-world low-quality images. The codes, models and the CelebRef-HQ dataset will be publicly available at https://github.com/csxmli2016/DMDNet.
盲脸修复是一项具有挑战性的任务,因为存在未知、不可合成且复杂的退化情况,但在许多实际应用中却很有价值。为了提高盲脸修复的性能,近期的工作主要分别处理两个方面,即通用修复和特定修复。具体而言,通用修复试图通过一般的面部结构先验来恢复结果,然而一方面,由于直接卷积神经网络(CNNs)在学习盲修复时的映射能力有限,无法推广到真实世界中退化的观测数据,另一方面,也未能利用特定身份的细节。相反,特定修复旨在从相同身份的参考中融入身份特征,其中对合适参考的要求严重限制了应用场景。一般来说,提高盲修复的逼真性能并使用单个统一模型自适应地处理通用和特定修复场景是一项具有挑战性且棘手的任务。本文并非隐式地学习从低质量图像到高质量对应图像的映射,而是通过双字典明确地记忆通用和特定特征,提出了一种深度多字典网络(DMDNet)。首先,通用字典从任何身份的高质量图像中学习一般的面部先验,而特定字典则分别为每个人存储属于该身份的特征。其次,为了处理有或没有特定参考的退化输入,建议使用字典变换模块从双字典中读取相关细节,随后将这些细节融合到输入特征中。最后,利用多尺度字典来实现从粗到细的修复。包括通用和特定字典在内的整个框架以端到端的方式进行优化,并且可以灵活地插入不同的应用场景。此外,还构建了一个名为CelebRef-HQ的新高质量数据集,以促进在高分辨率空间中对特定人脸修复的探索。实验结果表明,所提出的DMDNet在定量和定性评估中均优于现有技术,并且在真实世界的低质量图像上生成了更逼真的结果。代码、模型和CelebRef-HQ数据集将在https://github.com/csxmli2016/DMDNet上公开提供。