IEEE Trans Image Process. 2023;32:1978-1991. doi: 10.1109/TIP.2023.3261747.
Recently, deep convolution neural networks (CNNs) steered face super-resolution methods have achieved great progress in restoring degraded facial details by joint training with facial priors. However, these methods have some obvious limitations. On the one hand, multi-task joint learning requires additional marking on the dataset, and the introduced prior network will significantly increase the computational cost of the model. On the other hand, the limited receptive field of CNN will reduce the fidelity and naturalness of the reconstructed facial images, resulting in suboptimal reconstructed images. In this work, we propose an efficient CNN-Transformer Cooperation Network (CTCNet) for face super-resolution tasks, which uses the multi-scale connected encoder-decoder architecture as the backbone. Specifically, we first devise a novel Local-Global Feature Cooperation Module (LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a Transformer block, to promote the consistency of local facial detail and global facial structure restoration simultaneously. Then, we design an efficient Feature Refinement Module (FRM) to enhance the encoded features. Finally, to further improve the restoration of fine facial details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse the features from different stages in the encoder procedure. Extensive evaluations on various datasets have assessed that the proposed CTCNet can outperform other state-of-the-art methods significantly. Source code will be available at https://github.com/IVIPLab/CTCNet.
最近,深度卷积神经网络(CNN)引导的人脸超分辨率方法通过与人脸先验联合训练,在恢复退化人脸细节方面取得了重大进展。然而,这些方法存在一些明显的局限性。一方面,多任务联合学习需要在数据集上进行额外的标记,并且引入的先验网络将显著增加模型的计算成本。另一方面,CNN 的有限感受野会降低重建人脸图像的逼真度和自然度,导致重建图像不理想。在这项工作中,我们提出了一种用于人脸超分辨率任务的高效 CNN-Transformer 合作网络(CTCNet),该网络使用多尺度连接的编码器-解码器架构作为骨干。具体来说,我们首先设计了一种新颖的局部-全局特征合作模块(LGCM),它由面部结构注意力单元(FSAU)和 Transformer 块组成,以同时促进局部面部细节和全局面部结构恢复的一致性。然后,我们设计了一种高效的特征细化模块(FRM)来增强编码特征。最后,为了进一步提高精细面部细节的恢复效果,我们提出了一种多尺度特征融合单元(MFFU),以自适应地融合编码器过程中不同阶段的特征。在各种数据集上的广泛评估表明,所提出的 CTCNet 可以显著优于其他最先进的方法。源代码将在 https://github.com/IVIPLab/CTCNet 上提供。