Suppr超能文献

《SuperstarGAN:大规模域图像到图像转换的生成对抗网络》。

SuperstarGAN: Generative adversarial networks for image-to-image translation in large-scale domains.

机构信息

School of Electrical and Electronics Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, South Korea.

School of Mechanical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul 06974, South Korea.

出版信息

Neural Netw. 2023 May;162:330-339. doi: 10.1016/j.neunet.2023.02.042. Epub 2023 Mar 7.

Abstract

Image-to-image translation with generative adversarial networks (GANs) has been extensively studied in recent years. Among the models, StarGAN has achieved image-to-image translation for multiple domains with a single generator, whereas conventional models require multiple generators. However, StarGAN has several limitations, including the lack of capacity to learn mappings among large-scale domains; furthermore, StarGAN can barely express small feature changes. To address the limitations, we propose an improved StarGAN, namely SuperstarGAN. We adopted the idea, first proposed in controllable GAN (ControlGAN), of training an independent classifier with the data augmentation techniques to handle the overfitting problem in the classification of StarGAN structures. Since the generator with a well-trained classifier can express small features belonging to the target domain, SuperstarGAN achieves image-to-image translation in large-scale domains. Evaluated with a face image dataset, SuperstarGAN demonstrated improved performance in terms of Fréchet Inception distance (FID) and learned perceptual image patch similarity (LPIPS). Specifically, compared to StarGAN, SuperstarGAN exhibited decreased FID and LPIPS by 18.1% and 42.5%, respectively. Furthermore, we conducted an additional experiment with interpolated and extrapolated label values, indicating the ability of SuperstarGAN to control the degree of expression of the target domain features in generated images. Additionally, SuperstarGAN was successfully adapted to an animal face dataset and a painting dataset, where it can translate styles of animal faces (i.e., a cat to a tiger) and styles of painters (i.e., Hassam to Picasso), respectively, which explains the generality of SuperstarGAN regardless of datasets.

摘要

近年来,基于生成对抗网络(GAN)的图像到图像翻译得到了广泛研究。在这些模型中,StarGAN 仅用一个生成器就实现了多个领域的图像到图像翻译,而传统模型则需要多个生成器。然而,StarGAN 存在一些局限性,包括缺乏学习大规模领域之间映射的能力;此外,StarGAN 几乎无法表达小的特征变化。为了解决这些局限性,我们提出了一种改进的 StarGAN,即 SuperstarGAN。我们采用了在可控 GAN(ControlGAN)中首次提出的思想,即使用数据增强技术训练独立的分类器,以解决 StarGAN 结构分类中的过拟合问题。由于具有经过良好训练的分类器的生成器可以表达属于目标域的小特征,因此 SuperstarGAN 可以在大规模领域中实现图像到图像的翻译。在人脸图像数据集上进行评估时,SuperstarGAN 在 Fréchet Inception 距离(FID)和学习感知图像补丁相似度(LPIPS)方面表现出了改进的性能。具体而言,与 StarGAN 相比,SuperstarGAN 的 FID 和 LPIPS 分别降低了 18.1%和 42.5%。此外,我们还进行了带有插值和外推标签值的额外实验,表明了 SuperstarGAN 控制生成图像中目标域特征表达程度的能力。此外,SuperstarGAN 成功地应用于动物脸数据集和绘画数据集,它可以分别翻译动物脸的风格(即猫到老虎)和画家的风格(即哈萨姆到毕加索),这说明了 SuperstarGAN 的通用性,无论数据集如何。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验