IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7395-7411. doi: 10.1109/TPAMI.2022.3225788. Epub 2023 May 5.
Recent years witness the tremendous success of generative adversarial networks (GANs) in synthesizing photo-realistic images. GAN generator learns to compose realistic images and reproduce the real data distribution. Through that, a hierarchical visual feature with multi-level semantics spontaneously emerges. In this work we investigate that such a generative feature learned from image synthesis exhibits great potentials in solving a wide range of computer vision tasks, including both generative ones and more importantly discriminative ones. We first train an encoder by considering the pre-trained StyleGAN generator as a learned loss function. The visual features produced by our encoder, termed as Generative Hierarchical Features (GH-Feat), highly align with the layer-wise GAN representations, and hence describe the input image adequately from the reconstruction perspective. Extensive experiments support the versatile transferability of GH-Feat across a range of applications, such as image editing, image processing, image harmonization, face verification, landmark detection, layout prediction, image retrieval, etc. We further show that, through a proper spatial expansion, our developed GH-Feat can also facilitate fine-grained semantic segmentation using only a few annotations. Both qualitative and quantitative results demonstrate the appealing performance of GH-Feat. Code and models are available at https://genforce.github.io/ghfeat/.
近年来,生成对抗网络(GAN)在合成逼真图像方面取得了巨大成功。GAN 生成器学习合成逼真的图像并再现真实数据分布。通过这种方式,一个具有多层次语义的分层视觉特征自发出现。在这项工作中,我们研究了从图像合成中学习到的这种生成特征在解决广泛的计算机视觉任务中具有巨大的潜力,包括生成任务和更重要的判别任务。我们首先通过考虑预训练的 StyleGAN 生成器作为学习的损失函数来训练编码器。我们的编码器生成的视觉特征称为生成层次特征(GH-Feat),与层间 GAN 表示高度一致,因此从重建的角度充分描述了输入图像。广泛的实验支持 GH-Feat 在各种应用中的多功能可转移性,例如图像编辑、图像处理、图像协调、人脸验证、地标检测、布局预测、图像检索等。我们进一步表明,通过适当的空间扩展,我们开发的 GH-Feat 仅使用少量注释也可以促进细粒度的语义分割。定性和定量结果都证明了 GH-Feat 的吸引力。代码和模型可在 https://genforce.github.io/ghfeat/ 获得。